In the last chapter, I talked about a lot of theories. I feel shallow on paper. I absolutely know that I have to practice it. Today, we will do some tests on the arm linux platform to deepen our understanding. See how the compiler uses the stack. Don't say much, code:
#include <stdio.h> int fun(int a, int b) { int c = 10; return c * (a + b); } int main() { int a1 = 10; int a2 = 10; char b = 'h'; int c[10]; int res = fun(a1, a2); printf("res = %d\n", res); return 0; }
Test the generated executable_ Stack, execute objdump -SD test_stack, some key results of disassembly are as follows
00010440 <fun>: 10440: e52db004 push {fp} ; (str fp, [sp, #-4]!) 10444: e28db000 add fp, sp, #0 10448: e24dd014 sub sp, sp, #20 1044c: e50b0010 str r0, [fp, #-16] 10450: e50b1014 str r1, [fp, #-20] ; 0xffffffec 10454: e3a0300a mov r3, #10 10458: e50b3008 str r3, [fp, #-8] 1045c: e51b2010 ldr r2, [fp, #-16] 10460: e51b3014 ldr r3, [fp, #-20] ; 0xffffffec 10464: e0823003 add r3, r2, r3 10468: e51b2008 ldr r2, [fp, #-8] 1046c: e0030392 mul r3, r2, r3 10470: e1a00003 mov r0, r3 10474: e28bd000 add sp, fp, #0 10478: e49db004 pop {fp} ; (ldr fp, [sp], #4) 1047c: e12fff1e bx lr 00010480 <main>: 10480: e92d4800 push {fp, lr} 10484: e28db004 add fp, sp, #4 10488: e24dd038 sub sp, sp, #56 ; 0x38 1048c: e3a0300a mov r3, #10 10490: e50b3008 str r3, [fp, #-8] 10494: e3a0300a mov r3, #10 10498: e50b300c str r3, [fp, #-12] 1049c: e3a03068 mov r3, #104 ; 0x68 104a0: e54b300d strb r3, [fp, #-13] 104a4: e51b100c ldr r1, [fp, #-12] 104a8: e51b0008 ldr r0, [fp, #-8] 104ac: ebffffe3 bl 10440 <fun> 104b0: e50b0014 str r0, [fp, #-20] ; 0xffffffec 104b4: e51b1014 ldr r1, [fp, #-20] ; 0xffffffec 104b8: e59f0010 ldr r0, [pc, #16] ; 104d0 <main+0x50> 104bc: ebffff89 bl 102e8 <printf@plt> 104c0: e3a03000 mov r3, #0 104c4: e1a00003 mov r0, r3 104c8: e24bd004 sub sp, fp, #4 104cc: e8bd8800 pop {fp, pc} 104d0: 00010544 andeq r0, r1, r4, asr #10
//From the main disassembly results above, we can see that fp is still very important to a function, because the local variables in this function are basically indexed by fp + offset. Therefore, in case of function jump / / rotation, it is necessary to stack and save the current old fp, and then restore the original fp when the function returns, Otherwise, the local variables are all messed up.
Starting from the main function, when entering the main function, push {fp, lr} is executed first This is to stack the contents of the current fp register and lr register, which also means that they are the contents to be recovered before main returns. Then add fp, sp, #4 which means fp = sp + 4. What does this place mean? It just entered the stack. Well, just now we just saved it, but we still need to use fp to index all local variables, so after the push just now, sp points to lr and sp + 4 just happens to be fp
Then sub sp, sp, #56 this is more obvious. Pointer sp to - 56. We remember that arm is full minus stack. Here, we delimit part of the space at once, and sp points to the top of the stack. From now on, the space between fp and sp is the activity record of the main function. Here, let's calculate the size occupied by local variables in Main:
3 * sizeof(int) + sizeof(char) + 10 * sizeof(int) = 53. It's strange. Why is it 56 here. Let's look down first:
1048c: e3a0300a mov r3, #10 10490: e50b3008 str r3, [fp, #-8] 10494: e3a0300a mov r3, #10 10498: e50b300c str r3, [fp, #-12] 1049c: e3a03068 mov r3, #104 ; 0x68
104a0: e54b300d strb r3, [fp, #-13]
r3 = 10, then str r3, [fp, #-8] stores the contents of r3 in the address of fp-8, which is exactly the first valid space in the activity record of the main function, and strb is a Byte str, which exactly corresponds to our char type. Just now we calculated 53, actually 56. I reasonably guess that the char also accounts for 4 bytes in the stack. Maybe there is an alignment reason in it. A simple experiment was done to verify my conjecture. Let c[0] = 0, c[9] =0; Then disassembly can roughly understand the distribution of the whole stack as follows:
Then there is the step of calling fun. You can see
104a4: e51b100c ldr r1, [fp, #-12] 104a8: e51b0008 ldr r0, [fp, #-8] 104ac: ebffffe3 bl 10440 <fun>
It is obvious that the parameters are load ed from right to left to R1 and R0. It is not stack pressing. It seems that the parameters passed are not large enough, and the compiler optimizes the parameters into the cpu register. Then a bl 10400 jumped to fun's territory. In fun's territory, the stack will begin to grow again.
10440: e52db004 push {fp} ; (str fp, [sp, #-4]!) 10444: e28db000 add fp, sp, #0 10448: e24dd014 sub sp, sp, #201044c: e50b0010 str r0, [fp, #-16] 10450: e50b1014 str r1, [fp, #-20] ; 0xffffffec
First, fp stack, where fp of main is pressed, and then fp = sp + 0; This fp is fun's own fp, and then sp = sp - 20;
After these two steps, fp register points to the new location, and sp also points to the new location. What's strange here is that the stack of fun has 20 bytes?
But we have only one int c? Continue with this question:
10454: e3a0300a mov r3, #10 10458: e50b3008 str r3, [fp, #-8] 1045c: e51b2010 ldr r2, [fp, #-16] 10460: e51b3014 ldr r3, [fp, #-20] ; 0xffffffec 10464: e0823003 add r3, r2, r3 10468: e51b2008 ldr r2, [fp, #-8] 1046c: e0030392 mul r3, r2, r3 10470: e1a00003 mov r0, r3 10474: e28bd000 add sp, fp, #0 10478: e49db004 pop {fp} ; (ldr fp, [sp], #4) 1047c: e12fff1e bx lr
It can be seen that fun first overwhelms the values of R0 and R1 into its own stack. This step is actually to put the parameters of the function into the stack, and then ldr the parameters in the stack into the r2 and r3 registers,
Then add and mul are used to complete the operation, and the final result is stored in r0. Then sp = fp + 0; This step is very fierce. The value of sp now becomes fp, that is, the position that sp now points to is the base address of the stack frame of fun, and then the next move is pop {fp}. Good guy, this is to pop the content of the address that sp currently points to to to the fp memory, that is, now fp has changed back to the base address of the stack frame pointing to main. At the same time, due to pop, sp has to - 4 again. Finally BX LR, the PC jumps to the position after bl and continues to run. In this way, we return to the main scope. This stage is vividly expressed through a picture below:
After returning to the world of main, let's continue to look down:
104b0: e50b0014 str r0, [fp, #-20] ; 0xffffffec 104b4: e51b1014 ldr r1, [fp, #-20] ; 0xffffffec 104b8: e59f0010 ldr r0, [pc, #16] ; 104d0 <main+0x50> 104bc: ebffff89 bl 102e8 <printf@plt> 104c0: e3a03000 mov r3, #0 104c4: e1a00003 mov r0, r3 104c8: e24bd004 sub sp, fp, #4 104cc: e8bd8800 pop {fp, pc} 104d0: 00010544 andeq r0, r1, r4, asr #10
Remember that the previous res position is fp -20, and the return value is stored in r0 in the fun function. Here, first save the contents of r0 to fp -20, and then give res to ldr to r1 before calling printf, and then give the address of pc + 16 to ldr to r0.
//Here's a point of knowledge, pc + 16 How is it equal to 104 d0 of
In fact, it is caused by the pipeline when the instruction is executedldr r0, [pc, #16]When, pc The address of this instruction should be+8
That is, although the address of the current instruction is104b8, however pc It's 104 b8 + 8,Then add 16 and it's exactly 104 d0 Yes.
The reason is arm Three-stage pipeline architecture, that is to say pc It points to the address of the fetch instruction, pc-4 Is the decoded address, pc-8 Is the address of the execution.
Then bl is executed at printf;
000102e8 <printf@plt>: 102e8: e28fc600 add ip, pc, #0, 12 //ip = pc+0x00>>12 102ec: e28cca10 add ip, ip, #16, 20 ; 0x10000 102f0: e5bcfd1c ldr pc, [ip, #3356]! ; 0xd1c
This one is a little hard to chew. It involves system calls and dynamic libraries. Let's look at it later.