C Pointer Principle - AT&T Assembly

Posted by northstjarna on Fri, 17 May 2019 17:05:51 +0200

Assembly is more significant in LINUX system than in WINDOWS system. The code of LINUX kernel is compiled in assembly. Then, most Linux programmers have only been exposed to assembly language under DOS/Windows before, and these assembly codes are all Intel-style. But in Unix and Linux systems, AT&T format is more used. There are great differences in the grammatical format between them. Therefore, we should have a basic understanding and familiarity with AT&T assembly.  

We write a simple helloworld program in C under LINUX with the command hello.c.

#include <stdio.h>
 int main()
{ 
printf("hello,world\n");
 exit(0); 
} 

Then, GCC is used to compile and - s parameter is used to generate intermediate assembly code to see the true face of AT&T assembly.

.section .data#Initialized variables
output:
   .ascii "hello,world\n"
   #The string to be printed,.Data is a variable of the initialization value. output is a label indicating where the string begins, and ascii is a data type
.section .bss#Uninitialized variables, buffer filled by 0
   .lcomm num,20
   #lcomm is a local memory area that cannot be accessed outside the local assembly. comm is the general memory area.
.section .text#Assembly Language Instruction Code
   .globl _start#Start entry
   _start:
   movl $4,%eax#Called system function, 4 for write   
   movl $output,%ecx#String to print
   movl $1,%ebx#File descriptor, screen 1   
   movl $12,%edx#String length
   int $0x80#Display the string hello,world

   movl $0,%eax
   movl $num,%edi
   movl $65,1(%edi)#ascii of A
   movl $66,2(%edi)#ascii of B 
   movl $67,3(%edi)#ascii of C 
   movl $68,4(%edi)#ascii of D
   movl $10,5(%edi)#ascii of \n 

   movl $4,%eax#Called system function, 4 for write    
   movl $num,%ecx#String to print  
   movl $1,%ebx#File descriptor, screen 1   
   movl $6,%edx#String length
   int $0x80#Display string ABCD

   movl $1,%eax#1 quit
   movl $0,%ebx#The exit code value returned to the shell

   int $0x80#Kernel Soft Interrupt, Exit System

gcc -S  hello.c

.file   "hello.c"

    .section    .rodata

.LC0:

    .string "hello,world"

    .text

.globl main

    .type   main, @function

main:

    pushl   %ebp

    movl    %esp, %ebp

    andl    $-16, %esp

    subl    $16, %esp

    movl    $.LC0, (%esp)

    call    puts

    movl    $0, (%esp)

    call    exit

    .size   main, .-main

    .ident  "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"

    .section    .note.GNU-stack,"",@progbits

The function of assembler is to convert source program written in assembly language into object code in binary form. The standard assembler for Linux platform is GAS, which is a backend assembly tool relied on by GCC and is usually included in the binutils package.  
AT&T compilation has the following characteristics:
1. In AT&T assembly format, the register name should be prefixed with'%'.

Such as:

Copy the contents of eax registers into ebx

movl %eax,%ebx
2. Represent an immediate operand with the'$'prefix.

For example, copy 1 to eax

movl $1, %eax
3. The target operand is on the right of the source operand

movl %eax,%ebx
eax is the source operand and ebx is the target operand
4. In AT&T assembly format, the word length of the operand is determined by the last letter of the operator. The suffixes'b','w','l'denote that the operand is byte (8 bits), word (16 bits) and long word (32 bits), respectively.

For example:

movl operates on 32 bits and copies the contents of eax registers 32 bits into ebx

movl %eax, %ebx

Mow operates on 16 bits and copies the contents of the ax register into bx

movw %ax, %bx

Mob operates on 8 bits and copies the contents of the al register into bl

movb %al, %bl

Let's take stacking as an example:

Pushl% ecx 32-bit ECX content stacking

Pushw% CX 16-bit ecx content on the stack

pushl 0 # 80 is stacked as a 32-bit integer

Puhl data data variable content is stacked, 32 bits long

The operation pushl $data # is very special, adding $before the variable to indicate the address of the variable to be taken, which is to put the address of the data variable on the stack.
5. In AT&T assembly format, the number of operations of jump/call should be prefixed with'*'.
6. Operating codes for remote transfer instructions and remote subcall instructions, ljump and lcall in AT&T assembly format
We can see these characteristics from the generated intermediate code.

Let's take another look at a helloworld program written in AT&T assembly.

.section .data#Initialized variables
output:
   .ascii "hello,world\n"
   #The string to be printed,.Data is a variable of the initialization value. output is a label indicating where the string begins, and ascii is a data type
.section .bss#Uninitialized variables, buffer filled by 0
   .lcomm num,20
   #lcomm is a local memory area that cannot be accessed outside the local assembly. comm is the general memory area.
.section .text#Assembly Language Instruction Code
   .globl _start#Start entry
   _start:
   movl $4,%eax#Called system function, 4 for write   
   movl $output,%ecx#String to print
   movl $1,%ebx#File descriptor, screen 1   
   movl $12,%edx#String length
   int $0x80#Display the string hello,world

   movl $0,%eax
   movl $num,%edi
   movl $65,1(%edi)#ascii of A
   movl $66,2(%edi)#ascii of B 
   movl $67,3(%edi)#ascii of C 
   movl $68,4(%edi)#ascii of D
   movl $10,5(%edi)#ascii of \n 

   movl $4,%eax#Called system function, 4 for write    
   movl $num,%ecx#String to print  
   movl $1,%ebx#File descriptor, screen 1   
   movl $6,%edx#String length
   int $0x80#Display string ABCD

   movl $1,%eax#1 quit
   movl $0,%ebx#The exit code value returned to the shell

   int $0x80#Kernel Soft Interrupt, Exit System

We will explain the structure and content of the above assembly code:
1. Section. data segment stores initialized variables, section. BSS segment stores uninitialized variables.
2. Variables are defined in the following format:
Variable name:
Variable type variable value
This is how the output variable in the above code is defined
output:
   .ascii "hello,world\n"
The following example defines multiple variables
.section .data
msg:
.ascii "This is a text"
x:
.double 109.45, 2.33, 19.16
y:
.int 89
z:
.int 21, 85, 27
 
.equ  a 8
 
Among them, msg is a character character character, x is a double-precision character number, y and z are integers, A is a special definition, it is a static variable definition, using. equ variable name variable value to achieve.
3. The definition rules of variable access area in section. BSS are as follows:
lcomm is a local memory area, i.e., it cannot be accessed outside the local assembly, while. comm is a general memory area.
For example, the definition above
 .lcomm num,20  
num is the local memory area.
4. Section. text is assembly language instruction code, and the code marked with. globl_start is program start entry.
5. # denotes annotations. The rest of the above code is annotated. Programmers with assembly-based code should be easy to understand.
 
Variables are of the following types:
ascii text string
Asiz text string ending with NULL
byte
Double double double double precision number of characters
float single-precision number of characters
int 32-bit integer
long 32-bit integer
octa 16-bit integer
quad 8-bit integer
short 16-bit integer
Single single-precision number of characters
 
In addition, AT&T assembly often involves operations such as byte order inversion, comparative loading, swapping, pressing, popping up all registers, etc. The following examples cover these operations.
Each line of code has detailed comments.  

The data elements defined in the. bss segment are uninitialized variables, which are initialized at run time.

It can be divided into data common memory area and local common memory area.

The local common memory area cannot be accessed from outside the local assembly code.

text segment stores code

Topics: C++ ascii Linux Assembly Language Windows