Introduction to assembly language

Posted by yitanpaocai on Wed, 02 Feb 2022 20:45:55 +0100

Introduction to assembly language

Introduction to assembly language 1: Environmental preparation

Execution:

sudo apt-get update

Execution:

sudo apt-get install nasm

Check:

New file: vi t.c

int main() {
    return 0;
}

New file: first asm

global main

main:
    mov eax, 0
    ret

Compile to generate first file
(32-bit system):

$ nasm -f elf first.asm -o first.o
$ gcc -m32 first.o -o first

64 bit system:

$ nasm -f elf64 first.asm -o first.o
$ gcc -m64 first.o -o first

result:

function:

$ ./first ; echo $?

result:

Introduction to assembly language II: enjoy the environment first

New file: VI test asm

global main

main:
    mov eax, 1
    mov ebx, 2
    add eax, ebx
    ret

Compile run:

dontla@dontla-virtual-machine:~/desktop/test$ nasm -f elf64 test.asm -o test.o
dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test.o -o test
dontla@dontla-virtual-machine:~/desktop/test$ ./test ; echo $?
3
dontla@dontla-virtual-machine:~/desktop/test$ 

Exercise 1:

global main

main:
    mov eax, 1
    add eax, 2
    add eax, 3
    add eax, 4
    add eax, 5
    ret

result:

dontla@dontla-virtual-machine:~/desktop/test$ nasm -f elf64 test.asm -o test.o
dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test.o -o test
dontla@dontla-virtual-machine:~/desktop/test$ ./test ; echo $?
15

Exercise 2:

global main

main:
    mov eax, 1
    mov ebx, 2
    mov ecx, 3
    mov edx, 4
    add eax, ebx
    add eax, ecx
    add eax, edx
    ret

result:

dontla@dontla-virtual-machine:~/desktop/test$ nasm -f elf64 test.asm -o test.o
dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test.o -o test
dontla@dontla-virtual-machine:~/desktop/test$ ./test ; echo $?
10

Introduction to simple instructions

mov

For data transmission instructions, we can use mov instructions as follows to achieve the purpose of data transmission.

mov eax, 1          ; Give Way eax The value of is 1( eax = 1)
mov ebx, 2          ; Give Way ebx The value of is 2( ebx = 2)
mov ecx, eax        ; hold eax The value of is passed to ecx(ecx = eax)

add

Addition instruction

add eax, 2          ; eax = eax + 2
add ebx, eax        ; ebx = ebx + eax

ret

Return instruction, similar to return in C language, is used for return after function call (described in detail later).

sub

Subtraction instruction (similar to addition instruction)

sub eax, 1              ; eax = eax - 1
sub eax, ecx            ; eax = eax - ecx

More registers

In addition to eax, ebx, ecx and edx listed above, there are also some registers:

esi
edi
ebp

eax, ebx, ecx and edx are general-purpose registers, which can store data at will and participate in most operations. The remaining three are more common in some memory access scenarios, but at present, you can grab one and use it.

Introduction to assembly language 3: it's time to go to memory

Relationship between cpu, register and memory

Pointer and memory

Register and memory

The memory is outside the cpu and the registers are in the cpu. There are only a limited number of registers, which makes the cpu more expensive

Assign the value of the register to memory

mov [0x0699], eax
mov [0x0998], ebx
mov [0x1299], ecx
mov [0x1499], edx
mov [0x1999], esi

Assign the value of memory to the register

mov eax, [0x0699]
mov ebx, [0x0998]
mov ecx, [0x1299]
mov edx, [0x1499]
mov esi, [0x1999]

Hands on programming

global main

main:
    mov ebx, 1
    mov ecx, 2
    add ebx, ecx
    
    mov [sui_bian_xie], ebx
    mov eax, [sui_bian_xie]
    
    ret

section .data
sui_bian_xie   dw    0

result:

dontla@dontla-virtual-machine:~/desktop/test$ nasm -f elf64 test.asm -o test.o
dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test.o -o test -no-pie
dontla@dontla-virtual-machine:~/desktop/test$ ./test ; echo $?
3
dontla@dontla-virtual-machine:~/desktop/test$ ls -lh
 Total consumption 24 K
-rwxrwxr-x 1 dontla dontla 16K 6 March 22:32 test
-rw-rw-r-- 1 dontla dontla 151 6 March 22:08 test.asm
-rw-rw-r-- 1 dontla dontla 848 6 March 22:32 test.o
dontla@dontla-virtual-machine:~/desktop/test$ 

relocation R_X86_64_32S against `.data' can not be used when making a PIE object; recompile with -fP

Resolution:

mov ebx, 1                   ; take ebx Assign a value of 1
mov ecx, 2                   ; take ecx The value assigned is 2
add ebx, ecx                 ; ebx = ebx + ecx
    
mov [sui_bian_xie], ebx      ; take ebx Save your values
mov eax, [sui_bian_xie]      ; Read out the saved value again and put it in the eax in
    
ret                          ; Return, the final return value of the whole program is eax Value in

Note: the value of eax register when the program returns is the return value after the whole program exits. This is a convention in the environment we use at present, and we abide by it

These two lines of code:

section .data
sui_bian_xie   dw    0

The first line indicates that the following contents will be put into the data area of the executable file after compilation, and the corresponding memory will be allocated as the program starts. (if the registers are not enough, we should open up memory as a space for temporary data storage)

The second line is the key to describing the real data. This line means to open up a 4-byte space and fill it with 0. dw (double word) here means four bytes (one word type, two chars, one char, one byte, two chars are two bytes), and the Sui in front_ bian_ Xie means that you can write it casually here, that is, just give it a name, which is convenient for you to distinguish when writing code_ bian_ Xie will be processed into a specific address by the compiler during compilation. We don't need to care about the specific address. We know the Sui before and after anyway_ bian_ Xie refers to the same thing.

Crazy code writing

global main

main:
    mov ebx, [number_1]
    mov ecx, [number_2]
    add ebx, ecx
    
    mov [result], ebx
    mov eax, [result]
    
    ret

section .data
number_1      dw        10
number_2      dw        20
result        dw        0

result:

dontla@dontla-virtual-machine:~/desktop/test$ nasm -f elf64 test.asm -o test.o
dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test.o -o test -no-pie
dontla@dontla-virtual-machine:~/desktop/test$ ./test ; echo $?
30
dontla@dontla-virtual-machine:~/desktop/test$ 

Disassembly

First check whether the disassembly tool is installed,

dontla@dontla-virtual-machine:~/desktop/test$ which gdb
/usr/bin/gdb
dontla@dontla-virtual-machine:~/desktop/test$ 

Otherwise:

$ sudo apt-get install gdb -y

New test asm

global main

main:
    mov eax, 1
    mov ebx, 2
    add eax, ebx
    ret

compile:

dontla@dontla-virtual-machine:~/desktop/test$ nasm -f elf64 test.asm -o test.o
dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test.o -o test

function:

dontla@dontla-virtual-machine:~/desktop/test$ ./test ; echo $?
3

Disassemble the test file with gdb (disassemble the machine language into assembly language):

dontla@dontla-virtual-machine:~/desktop/test$ gdb ./test
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test...
(No debugging symbols found in ./test)
(gdb) 

The format adjustment of disassembly is called intel format:

(gdb) set disassembly-flavor intel
(gdb) 

continue:

(gdb) disas main
Dump of assembler code for function main:
   0x0000000000001130 <+0>:	mov    eax,0x1
   0x0000000000001135 <+5>:	mov    ebx,0x2
   0x000000000000113a <+10>:	add    eax,ebx
   0x000000000000113c <+12>:	ret    
   0x000000000000113d <+13>:	nop    DWORD PTR [rax]
End of assembler dump.
(gdb) 

Dynamic debugging

Break point:

(gdb) break *0x0000000000001135
Breakpoint 1 at 0x1135

Operation: error is found

(gdb) run
Starting program: /home/dontla/desktop/test/test 
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x1135

(gdb) 

Reference solution: compilation error: warning: cannot insert breakpoint 1 Cannot access memory at address 0x1135 solution

To view the value of the eax register:

(gdb) info registers eax 
eax            0x1                 1

To view the value of the ebx register:

(gdb) info registers ebx
ebx            0x55555140          1431654720

Execute the next procedure and look at the value of ebx:

(gdb) stepi
0x000055555555513a in main ()
(gdb) info registers ebx
ebx            0x2                 2

Continue single step execution and check the value. Enter the instruction disas to check which sentence of code the program executes:

(gdb) stepi 
0x000055555555513c in main ()
(gdb) info registers eax
eax            0x3                 3
(gdb) disas
Dump of assembler code for function main:
   0x0000555555555130 <+0>:	mov    $0x1,%eax
   0x0000555555555135 <+5>:	mov    $0x2,%ebx
   0x000055555555513a <+10>:	add    %ebx,%eax
=> 0x000055555555513c <+12>:	retq   
   0x000055555555513d <+13>:	nopl   (%rax)
End of assembler dump.
(gdb) 

If you want the program to run directly to the end, enter the instruction continue:

(gdb) continue
Continuing.
[Inferior 1 (process 32774) exited with code 03]
(gdb) 

Introduction to assembly language 4: get through C and assembly language

Episode: the relationship between C language and assembly language

New program test c

int x, y, z;

int main() {
    x = 2;
    y = 3;
    z = x + y;
    return z;
}

Compile execution output:

dontla@dontla-virtual-machine:~/desktop/test1$ gcc test.c -o test
dontla@dontla-virtual-machine:~/desktop/test1$ ./test ; echo $?
5
dontla@dontla-virtual-machine:~/desktop/test1$ 

The assembly code equivalent to the above c code is as follows:

global main

main:
    mov eax, 2
    mov [x], eax
    mov eax, 3
    mov [y], eax
    mov eax, [x]
    mov ebx, [y]
    add eax, ebx
    mov [z], eax
    mov eax, [z]
    ret


section .data
x       dw      0
y       dw      0
z       dw      0

Why do you want to save it, take it out and save it again? It's too troublesome!

Why not just do this?

global main 
 
main: 
    mov [x], 2 
    mov [y], 3 
    add [x], [y] 
    mov [z], [x] 
    ret 
    	
section .data 
x       dw      0 
y       dw      0 
z       dw      0 

Direct error reporting:

dontla@dontla-virtual-machine:~/desktop/test1$ nasm -f elf64 test.asm  -o test.o
test.asm:4: error: operation size not specified
test.asm:5: error: operation size not specified
test.asm:6: error: invalid combination of opcode and operands
test.asm:7: error: invalid combination of opcode and operands

I don't know why... Can't you operate memory directly? You have to operate memory through registers?

Change:

global main 
 
main: 
	mov eax, 2
    mov [x], eax 
    mov eax, [x]
    
    mov ebx, 3
    mov [y], ebx
    mov ebx, [y]
      
    add eax, ebx 
    mov [z], eax 
    ret 
    	
section .data 
x       dw      0 
y       dw      0 
z       dw      0 

Operation results:

dontla@dontla-virtual-machine:~/desktop/test1$ nasm -f elf64 test.asm  -o test.o
dontla@dontla-virtual-machine:~/desktop/test1$ gcc -m64 test.o -o test -no-pie
dontla@dontla-virtual-machine:~/desktop/test1$ ./test ; echo $?
5

Note: ret finally takes the value of eax as the return value, not the last sentence code

Note that the shortcut key for vi select all to delete the code is ESC – > ggvg

Uncover the true face of C program (check the C language program and the program written in assembly language respectively, compile the executable file, and disassemble the compiled code with gdb)

Prepare two codes:

test1.c

int x, y, z;

int main() {
    x = 2;
    y = 3;
    z = x + y;
    return z;
}

test2.asm

global main

main:
    mov eax, 2
    mov [x], eax
    mov eax, 3
    mov [y], eax
    mov eax, [x]
    mov ebx, [y]
    add eax, ebx
    mov [z], eax
    mov eax, [z]
    ret


section .data
x       dw      0
y       dw      0
z       dw      0

Compile and generate executable files test1 and test2 respectively:

dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test1.c -o test1
dontla@dontla-virtual-machine:~/desktop/test$ nasm -f elf64 test2.asm -o test2.o
dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 -fno-lto test2.o -o test2 -no-pie
dontla@dontla-virtual-machine:~/desktop/test$ ls
test1  test1.c  test2  test2.asm  test2.o
dontla@dontla-virtual-machine:~/desktop/test$ 

1. View the disassembly code of test1 file with gdb:

dontla@dontla-virtual-machine:~/desktop/test$ gdb ./test1
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test1...
(No debugging symbols found in ./test1)
(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
   0x0000000000001129 <+0>:	endbr64 
   0x000000000000112d <+4>:	push   rbp
   0x000000000000112e <+5>:	mov    rbp,rsp
   0x0000000000001131 <+8>:	mov    DWORD PTR [rip+0x2edd],0x2        # 0x4018 <x>
   0x000000000000113b <+18>:	mov    DWORD PTR [rip+0x2ed7],0x3        # 0x401c <y>
   0x0000000000001145 <+28>:	mov    edx,DWORD PTR [rip+0x2ecd]        # 0x4018 <x>
   0x000000000000114b <+34>:	mov    eax,DWORD PTR [rip+0x2ecb]        # 0x401c <y>
   0x0000000000001151 <+40>:	add    eax,edx
   0x0000000000001153 <+42>:	mov    DWORD PTR [rip+0x2ebb],eax        # 0x4014 <z>
   0x0000000000001159 <+48>:	mov    eax,DWORD PTR [rip+0x2eb5]        # 0x4014 <z>
   0x000000000000115f <+54>:	pop    rbp
   0x0000000000001160 <+55>:	ret    
End of assembler dump.
(gdb) 

2. Check the disassembly code of test2 with gdb:

dontla@dontla-virtual-machine:~/desktop/test$ gdb ./test2
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test2...
(No debugging symbols found in ./test2)
(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
   0x0000000000401110 <+0>:	mov    eax,0x2
   0x0000000000401115 <+5>:	mov    DWORD PTR ds:0x404028,eax
   0x000000000040111c <+12>:	mov    eax,0x3
   0x0000000000401121 <+17>:	mov    DWORD PTR ds:0x40402a,eax
   0x0000000000401128 <+24>:	mov    eax,DWORD PTR ds:0x404028
   0x000000000040112f <+31>:	mov    ebx,DWORD PTR ds:0x40402a
   0x0000000000401136 <+38>:	add    eax,ebx
   0x0000000000401138 <+40>:	mov    DWORD PTR ds:0x40402c,eax
   0x000000000040113f <+47>:	mov    eax,DWORD PTR ds:0x40402c
   0x0000000000401146 <+54>:	ret    
   0x0000000000401147 <+55>:	nop    WORD PTR [rax+rax*1+0x0]
End of assembler dump.
(gdb) 

You can compare:
test1 disassembly

   0x0000000000001129 <+0>:	endbr64 
   0x000000000000112d <+4>:	push   rbp
   0x000000000000112e <+5>:	mov    rbp,rsp
   0x0000000000001131 <+8>:	mov    DWORD PTR [rip+0x2edd],0x2        # 0x4018 <x>
   0x000000000000113b <+18>:	mov    DWORD PTR [rip+0x2ed7],0x3        # 0x401c <y>
   0x0000000000001145 <+28>:	mov    edx,DWORD PTR [rip+0x2ecd]        # 0x4018 <x>
   0x000000000000114b <+34>:	mov    eax,DWORD PTR [rip+0x2ecb]        # 0x401c <y>
   0x0000000000001151 <+40>:	add    eax,edx
   0x0000000000001153 <+42>:	mov    DWORD PTR [rip+0x2ebb],eax        # 0x4014 <z>
   0x0000000000001159 <+48>:	mov    eax,DWORD PTR [rip+0x2eb5]        # 0x4014 <z>
   0x000000000000115f <+54>:	pop    rbp
   0x0000000000001160 <+55>:	ret    

test2 disassembly

   0x0000000000401110 <+0>:	mov    eax,0x2
   0x0000000000401115 <+5>:	mov    DWORD PTR ds:0x404028,eax
   0x000000000040111c <+12>:	mov    eax,0x3
   0x0000000000401121 <+17>:	mov    DWORD PTR ds:0x40402a,eax
   0x0000000000401128 <+24>:	mov    eax,DWORD PTR ds:0x404028
   0x000000000040112f <+31>:	mov    ebx,DWORD PTR ds:0x40402a
   0x0000000000401136 <+38>:	add    eax,ebx
   0x0000000000401138 <+40>:	mov    DWORD PTR ds:0x40402c,eax
   0x000000000040113f <+47>:	mov    eax,DWORD PTR ds:0x40402c
   0x0000000000401146 <+54>:	ret    
   0x0000000000401147 <+55>:	nop    WORD PTR [rax+rax*1+0x0]

How does the disassembly I generated differ so much from the author's???

The author said that the assembly language of test2 can also be simplified as follows:

global main

main:
    mov dword [x], 0x2
    mov dword [y], 0x3
    mov eax, [x]
    mov ebx, [y]
    add eax, ebx
    mov [z], eax
    mov eax, [z]
    ret

section .data
x       dw      0
y       dw      0
z       dw      0

Introduction to assembly language 5: process control (I)

The CPU has a register inside, which is specially used to record where the program is executed

x86: eip register

eip cannot be modified manually, but the system itself can

Jump instruction jmp

global main

main:
    mov eax, 1
    mov ebx, 2
    
    jmp gun_kai
    
    add eax, ebx
gun_kai:
    ret

Equivalent C language program:

int main() {
    int a = 1;
    int b = 2;
    
    goto gun_kai;
    
    a = a + b;
    
gun_kai:
    return a;
}

In fact, the goto statement in C language is a jmp instruction after compilation. Its function is to jump directly to a certain place. You can jump forward or backward. The goal of jump is the label behind jmp. After compilation, this label will be processed into an address, which is actually jumping to a certain address. The function of jmp inside the CPU is to modify eip and make it suddenly become another value, Then the CPU will jump over and execute the code elsewhere

What if looks like in the assembly

int main() {
    int a = 50;
    if( a > 10 ) {
        a = a - 10;
    }
    return a;
}

Equivalent assembly code:

global main

main:
    mov eax, 50
    cmp eax, 10                         ; yes eax And 10
    jle xiaoyu_dengyu_shi            ; If 10 is less than or equal to eax Jump when
    sub eax, 10
xiaoyu_dengyu_shi:
    ret

notes:

cmp Instruction, which is specially used to compare two numbers
jle,Conditional jump instruction: jump when the previous comparison result is "less than or equal to", otherwise it will not jump

Other instructions:

Equal to
jnb Not less than
jnbe Not less than or equal to
jne Not equal to
jg greater than(Signed)
jge Greater than or equal to(Signed)
jl less than(Signed)
jle Less than or equal to(Signed)
jng Not greater than(Signed)
jnge Not greater than or equal to(Signed)
jnl Not less than
jnle Not less than or equal to
jns Unsigned
jnz Nonzero
js If signed
jz If zero

First, the jump instruction is preceded by the letter j
The key is the letter after j
For example, j is followed by ne, which corresponds to jne jump instruction. n and e correspond to not and equal respectively, that is, "unequal". That is, when the result of the comparison instruction is "don't want to wait", it will jump.

a: above
e: equal
b: below
n: not
g: greater
l: lower
s: signed
z: zero

View disassembly of else and else if

New program test c

int main() {
    register int grade = 80;
    register int level;
    if ( grade >= 85 ){
        level = 1;
    } else if ( grade >= 70 ) {
        level = 2;
    } else if ( grade >= 60 ) {
        level = 3;
    } else {
        level = 4;
    }
    return level;
}

(there is a register keyword in the program, which is used to limit that this variable can only be represented by registers after compilation, which is convenient for us to analyze. Readers can remove the register keyword and compare the disassembly code as needed.)

Run the program first to check the return value:

dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test.c -o test
dontla@dontla-virtual-machine:~/desktop/test$ ./test ; echo $?
2

View assembly code with gdb disassembly:

dontla@dontla-virtual-machine:~/desktop/test$ gdb ./test
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test...
(No debugging symbols found in ./test)
(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
   0x0000000000001129 <+0>:	endbr64 
   0x000000000000112d <+4>:	push   rbp
   0x000000000000112e <+5>:	mov    rbp,rsp
   0x0000000000001131 <+8>:	push   rbx
   0x0000000000001132 <+9>:	mov    ebx,0x50
   0x0000000000001137 <+14>:	cmp    ebx,0x54
   0x000000000000113a <+17>:	jle    0x1143 <main+26>
   0x000000000000113c <+19>:	mov    ebx,0x1
   0x0000000000001141 <+24>:	jmp    0x1160 <main+55>
   0x0000000000001143 <+26>:	cmp    ebx,0x45
   0x0000000000001146 <+29>:	jle    0x114f <main+38>
   0x0000000000001148 <+31>:	mov    ebx,0x2
   0x000000000000114d <+36>:	jmp    0x1160 <main+55>
   0x000000000000114f <+38>:	cmp    ebx,0x3b
   0x0000000000001152 <+41>:	jle    0x115b <main+50>
   0x0000000000001154 <+43>:	mov    ebx,0x3
   0x0000000000001159 <+48>:	jmp    0x1160 <main+55>
   0x000000000000115b <+50>:	mov    ebx,0x4
   0x0000000000001160 <+55>:	mov    eax,ebx
   0x0000000000001162 <+57>:	pop    rbx
   0x0000000000001163 <+58>:	pop    rbp
   0x0000000000001164 <+59>:	ret    
End of assembler dump.
(gdb) 

I can't see anything. I'm dizzy. I don't see it anymore!

Status register eflags

The function of "flag register" is to remember some special CPU States, such as whether the result of the previous operation is positive or negative, whether there is carry in the calculation process, and whether the calculation result is zero. The subsequent jump instruction determines whether to jump according to the state in the eflags register.
cmp instruction is actually subtracting two operands, and some states after subtraction will eventually be reflected in the eflags register

Introduction to assembly language 6: process control (2)

Disassembly cycle structure

This is a C language loop program:

int main{
	int sum = 0;
	int i = 1;
	while( i <= 10 ) {
	    sum = sum + i;
	    i = i + 1;
	return sum;
}

If you don't use loops, how do you implement goto?

int sum = 10;
int i = 1;

_start:
if( i <= 10 ) {
    sum = sum + i;
    i = i + 1;
    goto _start;
}

Of course, the "Shanzhai" C language code compiled by the anti foreign exchange is actually like this:

int sum = 10;
int i = 1;

_start:
if( i > 10 ) {
    goto _end_of_block;
}

sum = sum + i;
i = i + 1;
goto _start;

_end_of_block:

. . .

Write a loop in assembly

Translate the above code directly into assembly, as follows:

global main

main:
    mov eax, 0
    mov ebx, 1
_start:
    cmp ebx, 10
    jg _end_of_block
    
    add eax, ebx
    add ebx, 1
    jmp _start
    
_end_of_block:
    ret

It is nothing more than the substitution of some existing assembly instructions

What about other cycles?

It's all a routine. It doesn't make any difference

Introduction to assembly language 7: function call (1)

Example of assembly language calling function:

global main

eax_plus_1s:
    add eax, 1
    ret

ebx_plus_1s:
    add ebx, 1
    ret

main:
    mov eax, 0
    mov ebx, 0
    call eax_plus_1s
    call eax_plus_1s
    call ebx_plus_1s
    add eax, ebx
    ret

In fact, when the call instruction is executed, one more thing the CPU needs to do before jumping is to save the eip and jump to the target. When encountering the ret instruction, restore the eip saved in the last call. We know that the eip directly determines where the CPU will execute the code. When the eip is restored, it means that the program will go to the previous position again.

A program cannot avoid call ing many times. Where are these eip values saved?

There is a place called "stack", which is a memory area designated by the operating system before the program starts. The return address after each function call is stored in the stack

What is stack?

. . .

In the actual CPU, the above top of the stack is also recorded by a register, which is called esp(stack pointer), every time the call instruction is executed.

When eip is put on the stack (call function), it is roughly equivalent to executing such instructions:

sub esp, 4
mov dword ptr[esp], eip

Translated into C language (if esp is a pointer of void *):

esp = (void*)( ((unsigned int)esp) - 4 )//The stack pointer moves 4 bytes to the lower order
*( (unsigned int*) esp ) = (unsigned int) eip//Save the address of the program before call ing the function in the memory pointed to by the stack pointer

Hands on test

New code test asm:

global main

eax_plus_1s:
    add eax, 1
    ret

main:
    mov eax, 0
    call eax_plus_1s
    ret

Compile the executable file test first:

dontla@dontla-virtual-machine:~/desktop/test$ nasm -f elf64 test.asm -o test.o
dontla@dontla-virtual-machine:~/desktop/test$ gcc -m64 test.o -o test
dontla@dontla-virtual-machine:~/desktop/test$ ls
test  test.asm  test.o
dontla@dontla-virtual-machine:~/desktop/test$ 

Disassemble with gdb and make a breakpoint at + 5 (run before disas):

dontla@dontla-virtual-machine:~/desktop/test$ gdb ./test
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test...
(No debugging symbols found in ./test)
(gdb) r
Starting program: /home/dontla/desktop/test/test 
[Inferior 1 (process 38995) exited with code 01]
(gdb) disas main
Dump of assembler code for function main:
   0x0000555555555134 <+0>:	mov    $0x0,%eax
   0x0000555555555139 <+5>:	callq  0x555555555130 <eax_plus_1s>
   0x000055555555513e <+10>:	retq   
   0x000055555555513f <+11>:	nop
End of assembler dump.
(gdb) b *0x0000555555555139
Breakpoint 1 at 0x555555555139
(gdb) r
Starting program: /home/dontla/desktop/test/test 

Breakpoint 1, 0x0000555555555139 in main ()
(gdb) 

Then disas main view disassembly:

(gdb) disas main
Dump of assembler code for function main:
   0x0000555555555134 <+0>:	mov    $0x0,%eax
=> 0x0000555555555139 <+5>:	callq  0x555555555130 <eax_plus_1s>
   0x000055555555513e <+10>:	retq   
   0x000055555555513f <+11>:	nop
End of assembler dump.
(gdb) 

To view the value of the eip register:

(gdb) info register eip
Invalid register `eip'

An error is reported, saying that the register eip is invalid

Reason: linux assembler gdb reports an error: Invalid register `eip '(64 bit system has no eip, only rip register)

Check the value of the instruction pointer rip register: (checking the address of the rip register is meaningless, only the value is meaningful)

(gdb) info register rip
rip            0x555555555139      0x555555555139 <main+5>

View the value of stack pointer rsp register: (the value of rsp register saves the address of stack top pointer)

(gdb) info registers rsp
rsp            0x7fffffffe408      0x7fffffffe408

View the value of the top of the stack pointed to by the address stored in the stack pointer register rsp

(gdb) p/x *(unsigned int*)$rsp
$1 = 0xf7dea0b3

Next, use stepi to execute the next program: you can see that the program is executed into the function, and then use disas to see which code is executed into the function:

(gdb) stepi
0x0000555555555130 in eax_plus_1s ()
(gdb) disas
Dump of assembler code for function eax_plus_1s:
=> 0x0000555555555130 <+0>:	add    $0x1,%eax
   0x0000555555555133 <+3>:	retq   
End of assembler dump.

Now look at the value of rsp register: it is 4 less than just now

(gdb) info register rsp
rsp            0x7fffffffe400      0x7fffffffe400

Then check what rsp points to the top of the stack:

(gdb) p/x *(unsigned int*)$rsp
$2 = 0x5555513e

What is this $2 = 0x55513e? Don't worry. Let's check the disassembly of the main function:

Dump of assembler code for function main:
   0x0000555555555134 <+0>:	mov    $0x0,%eax
   0x0000555555555139 <+5>:	callq  0x555555555130 <eax_plus_1s>
   0x000055555555513e <+10>:	retq   
   0x000055555555513f <+11>:	nop
End of assembler dump.

Eh, why is it different from the author's???

Why do I have so many 555? emmmmmmmmmmmmmmmmmmmmmmmm

However, at least the value of the stack top pointer can be determined (pay attention to distinguish the address of the stack top pointer from the concept of value), which is the address of the next code after the function is executed

Introduction to assembly language 8: function call (2)

Transfer process of parameters and return values during function call: in assembly language, parameters and return values of function call can be transferred through registers

It's not that simple

. . .

Alert scope

. . .

Registers in the CPU are globally visible. So using registers is actually using something like a global variable.

What exactly do you need

To achieve recursion, the state of the function needs to be locally visible and can only be accessed in the current layer of functions. In recursion, layers will call themselves. The state between each layer should ensure locality and cannot affect each other.

In the environment of C language, the local variables in a function are actually the local state when the function is executed. In the assembly environment, registers are globally visible and cannot be used as local variables.

stack

. . .

Referring to the idea of saving the return address of call instruction, if the current key registers are saved in the stack in each layer of function, and then the next layer of function is called, and when the lower layer of function returns, the registers are recovered from the stack, so as to ensure that the lower layer of function will not destroy the shape of the upper layer of function.

. . .

Inbound and outbound

. . .

push eax            ; take eax Save the value of to the stack
pop ebx         ; Take out the value at the top of the stack and store it in the ebx in

. . .

Make a function that won't affect the whole world

. . .

Recursion again

Then, we have solved the problem of saving the local state in the function. One of the routines is to let the function save the old value before using a register, and then restore it after it is used up. After the execution of the function, all the registers are clean and will not be stained by the function.

Functions in C language

. . .

Conclusion: I don't understand why push out pop is used???

Take a look at this:

[push and pop beep beep beep] https://b23.tv/ZBdrp3

push x is to save (a memory address that stores the value of x) from the heap to a space. Similarly, pop is to take it out, but the system is stupid and doesn't know who the stored value belongs to, so push pop should pay attention to the order and don't get confused,

Introduction to assembly language 9: summary and follow-up (gossip)

What do you learn from compilation

. . .

Topics: Linux Assembly Language Ubuntu