Integrating and connecting series -- actual combat consolidation

Posted by Hi I Am Timbo on Sun, 31 Oct 2021 18:22:28 +0100

In the last chapter, I talked about a lot of theories. I feel shallow on paper. I absolutely know that I have to practice it. Today, we will do some tests on the arm linux platform to deepen our understanding. See how the compiler uses the stack. Don't say much, code:

#include <stdio.h>

int fun(int a, int b)
{
    int c = 10;
    return c * (a + b);
}

int main()
{
    int  a1 = 10;
    int  a2 = 10;
    char b = 'h';
    int  c[10];
    int res = fun(a1, a2);
    printf("res = %d\n", res);
    return 0;
}

Test the generated executable_ Stack, execute objdump -SD test_stack, some key results of disassembly are as follows

00010440 <fun>:
   10440:       e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   10444:       e28db000        add     fp, sp, #0
   10448:       e24dd014        sub     sp, sp, #20
   1044c:       e50b0010        str     r0, [fp, #-16]
   10450:       e50b1014        str     r1, [fp, #-20]  ; 0xffffffec
   10454:       e3a0300a        mov     r3, #10
   10458:       e50b3008        str     r3, [fp, #-8]
   1045c:       e51b2010        ldr     r2, [fp, #-16]
   10460:       e51b3014        ldr     r3, [fp, #-20]  ; 0xffffffec
   10464:       e0823003        add     r3, r2, r3
   10468:       e51b2008        ldr     r2, [fp, #-8]
   1046c:       e0030392        mul     r3, r2, r3
   10470:       e1a00003        mov     r0, r3
   10474:       e28bd000        add     sp, fp, #0
   10478:       e49db004        pop     {fp}            ; (ldr fp, [sp], #4)
   1047c:       e12fff1e        bx      lr

00010480 <main>:
   10480:       e92d4800        push    {fp, lr}
   10484:       e28db004        add     fp, sp, #4
   10488:       e24dd038        sub     sp, sp, #56     ; 0x38
   1048c:       e3a0300a        mov     r3, #10
   10490:       e50b3008        str     r3, [fp, #-8]
   10494:       e3a0300a        mov     r3, #10
   10498:       e50b300c        str     r3, [fp, #-12]
   1049c:       e3a03068        mov     r3, #104        ; 0x68
   104a0:       e54b300d        strb    r3, [fp, #-13]
   104a4:       e51b100c        ldr     r1, [fp, #-12]
   104a8:       e51b0008        ldr     r0, [fp, #-8]
   104ac:       ebffffe3        bl      10440 <fun>
   104b0:       e50b0014        str     r0, [fp, #-20]  ; 0xffffffec
   104b4:       e51b1014        ldr     r1, [fp, #-20]  ; 0xffffffec
   104b8:       e59f0010        ldr     r0, [pc, #16]   ; 104d0 <main+0x50>
   104bc:       ebffff89        bl      102e8 <printf@plt>
   104c0:       e3a03000        mov     r3, #0
   104c4:       e1a00003        mov     r0, r3
   104c8:       e24bd004        sub     sp, fp, #4
   104cc:       e8bd8800        pop     {fp, pc}
   104d0:       00010544        andeq   r0, r1, r4, asr #10

//From the main disassembly results above, we can see that fp is still very important to a function, because the local variables in this function are basically indexed by fp + offset. Therefore, in case of function jump / / rotation, it is necessary to stack and save the current old fp, and then restore the original fp when the function returns, Otherwise, the local variables are all messed up.

 

Starting from the main function, when entering the main function, push {fp, lr} is executed first   This is to stack the contents of the current fp register and lr register, which also means that they are the contents to be recovered before main returns. Then add fp, sp, #4 which means fp = sp + 4. What does this place mean? It just entered the stack. Well, just now we just saved it, but we still need to use fp to index all local variables, so after the push just now, sp points to lr and sp + 4 just happens to be fp

Then sub sp, sp, #56 this is more obvious. Pointer sp to - 56. We remember that arm is full minus stack. Here, we delimit part of the space at once, and sp points to the top of the stack. From now on, the space between fp and sp is the activity record of the main function. Here, let's calculate the size occupied by local variables in Main:

3 * sizeof(int) + sizeof(char) + 10 * sizeof(int) = 53. It's strange. Why is it 56 here. Let's look down first:

   1048c:       e3a0300a        mov     r3, #10
   10490:       e50b3008        str     r3, [fp, #-8]
   10494:       e3a0300a        mov     r3, #10
   10498:       e50b300c        str     r3, [fp, #-12]
   1049c:       e3a03068        mov     r3, #104        ; 0x68
104a0: e54b300d strb r3, [fp, #-13]

r3 = 10, then str r3, [fp, #-8] stores the contents of r3 in the address of fp-8, which is exactly the first valid space in the activity record of the main function, and strb is a Byte str, which exactly corresponds to our char type. Just now we calculated 53, actually 56. I reasonably guess that the char also accounts for 4 bytes in the stack. Maybe there is an alignment reason in it. A simple experiment was done to verify my conjecture. Let c[0] = 0, c[9] =0; Then disassembly can roughly understand the distribution of the whole stack as follows:

 

  Then there is the step of calling fun. You can see

   104a4:       e51b100c        ldr     r1, [fp, #-12]
   104a8:       e51b0008        ldr     r0, [fp, #-8]
   104ac:       ebffffe3        bl      10440 <fun>

It is obvious that the parameters are load ed from right to left to R1 and R0. It is not stack pressing. It seems that the parameters passed are not large enough, and the compiler optimizes the parameters into the cpu register. Then a bl 10400 jumped to fun's territory. In fun's territory, the stack will begin to grow again.

  10440:       e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   10444:       e28db000        add     fp, sp, #0
   10448:       e24dd014        sub     sp, sp, #20
   1044c:       e50b0010        str     r0, [fp, #-16]
   10450:       e50b1014        str     r1, [fp, #-20]  ; 0xffffffec

First, fp stack, where fp of main is pressed, and then fp = sp + 0; This fp is fun's own fp, and then sp   = sp - 20;

After these two steps, fp register points to the new location, and sp also points to the new location. What's strange here is that the stack of fun has 20 bytes?

But we have only one int c? Continue with this question:

   10454:       e3a0300a        mov     r3, #10
   10458:       e50b3008        str     r3, [fp, #-8]
   1045c:       e51b2010        ldr     r2, [fp, #-16]
   10460:       e51b3014        ldr     r3, [fp, #-20]  ; 0xffffffec
   10464:       e0823003        add     r3, r2, r3
   10468:       e51b2008        ldr     r2, [fp, #-8]
   1046c:       e0030392        mul     r3, r2, r3
   10470:       e1a00003        mov     r0, r3
   10474:       e28bd000        add     sp, fp, #0
   10478:       e49db004        pop     {fp}            ; (ldr fp, [sp], #4)
   1047c:       e12fff1e        bx      lr

It can be seen that fun first overwhelms the values of R0 and R1 into its own stack. This step is actually to put the parameters of the function into the stack, and then ldr the parameters in the stack into the r2 and r3 registers,

Then add and mul are used to complete the operation, and the final result is stored in r0. Then sp = fp + 0; This step is very fierce. The value of sp now becomes fp, that is, the position that sp now points to is the base address of the stack frame of fun, and then the next move is pop {fp}. Good guy, this is to pop the content of the address that sp currently points to to to the fp memory, that is, now fp has changed back to the base address of the stack frame pointing to main. At the same time, due to pop, sp has to - 4 again. Finally BX LR, the PC jumps to the position after bl and continues to run. In this way, we return to the main scope. This stage is vividly expressed through a picture below:

 

  After returning to the world of main, let's continue to look down:

   104b0:       e50b0014        str     r0, [fp, #-20]  ; 0xffffffec
   104b4:       e51b1014        ldr     r1, [fp, #-20]  ; 0xffffffec
   104b8:       e59f0010        ldr     r0, [pc, #16]   ; 104d0 <main+0x50>
   104bc:       ebffff89        bl      102e8 <printf@plt>
   104c0:       e3a03000        mov     r3, #0
   104c4:       e1a00003        mov     r0, r3
   104c8:       e24bd004        sub     sp, fp, #4
   104cc:       e8bd8800        pop     {fp, pc}
   104d0:       00010544        andeq   r0, r1, r4, asr #10

Remember that the previous res position is fp -20, and the return value is stored in r0 in the fun function. Here, first save the contents of r0 to fp -20, and then give res to ldr to r1 before calling printf, and then give the address of pc + 16 to ldr to r0.

//Here's a point of knowledge, pc + 16 How is it equal to 104 d0 of
In fact, it is caused by the pipeline when the instruction is executedldr r0, [pc, #16]When, pc The address of this instruction should be+8
That is, although the address of the current instruction is104b8, however pc It's 104 b8 + 8,Then add 16 and it's exactly 104 d0 Yes.
The reason is arm Three-stage pipeline architecture, that is to say pc It points to the address of the fetch instruction, pc-4 Is the decoded address, pc-8 Is the address of the execution.

Then bl is executed at printf;

   000102e8 <printf@plt>:
   102e8:       e28fc600        add     ip, pc, #0, 12  //ip = pc+0x00>>12
   102ec:       e28cca10        add     ip, ip, #16, 20 ; 0x10000
   102f0:       e5bcfd1c        ldr     pc, [ip, #3356]!        ; 0xd1c

This one is a little hard to chew. It involves system calls and dynamic libraries. Let's look at it later.