理解aarch64汇编函数调用,栈是如何操作的 [英] understanding aarch64 assembly function call, how is stack operated

查看:47
本文介绍了理解aarch64汇编函数调用,栈是如何操作的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

test.c(裸机)

#include <stdio.h>

int add1(int a, int b)
{
int c;
c = a + b;
return c;
}

int main()
{
int x, y, z;
x = 3;
y = 4;
z = add1(x,y);
printf("z = %d\n", z);
}

我做 aarch64-none-elf-gcc test.c -specs=rdimon.specs 并得到 a.out.我做了 aarch64-none-elf-objdump -d a.out 并得到了汇编代码.这是主要功能.

I do aarch64-none-elf-gcc test.c -specs=rdimon.specs and get a.out. I do aarch64-none-elf-objdump -d a.out and got the assemlby code. Here is the main function.

00000000004002e0 <add1>:
  4002e0:   d10083ff    sub sp, sp, #0x20       <-- reduce sp by 0x20 (just above it are saved fp and lr of main)
  4002e4:   b9000fe0    str w0, [sp, #12]       <-- save first param x at sp + 12
  4002e8:   b9000be1    str w1, [sp, #8]        <-- save second param y at sp + 8
  4002ec:   b9400fe1    ldr w1, [sp, #12]       <-- load w1 with x
  4002f0:   b9400be0    ldr w0, [sp, #8]        <-- load w0 with y
  4002f4:   0b000020    add w0, w1, w0          <-- w0 = w1 + w0
  4002f8:   b9001fe0    str w0, [sp, #28]       <-- store x0 to sp+28
  4002fc:   b9401fe0    ldr w0, [sp, #28]       <-- load w0 with the result (seems redundant)
  400300:   910083ff    add sp, sp, #0x20       <-- increment sp by 0x20
  400304:   d65f03c0    ret
0000000000400308 <main>:
  400308:   a9be7bfd    stp x29, x30, [sp, #-32]!   <-- save x29(fp) and x30(lr) at sp - 0x20
  40030c:   910003fd    mov x29, sp                 <-- set fp to new sp, the base of stack growth(down)
  400310:   52800060    mov w0, #0x3                    // #3
  400314:   b9001fe0    str w0, [sp, #28]           <-- x is assigned in sp + #28
  400318:   52800080    mov w0, #0x4                    // #4
  40031c:   b9001be0    str w0, [sp, #24]           <-- y is assiged in sp + #24
  400320:   b9401be1    ldr w1, [sp, #24]            <-- load func param for y
  400324:   b9401fe0    ldr w0, [sp, #28]           <-- load func param for x
  400328:   97ffffee    bl  4002e0 <add1>           <-- call main1 (args are in w0, w1)
  40032c:   b90017e0    str w0, [sp, #20]           <-- store x0(result z) to sp+20
  400330:   b94017e1    ldr w1, [sp, #20]           <-- load w1 with the result (why? seems redundant. it's already in w0)
  400334:   d0000060    adrp    x0, 40e000 <__sfp_handle_exceptions+0x28>
  400338:   91028000    add x0, x0, #0xa0  <-- looks like loading param x0 for printf
  40033c:   940000e7    bl  4006d8 <printf>
  400340:   52800000    mov w0, #0x0                    // #0 <-- for main's return value..
  400344:   a8c27bfd    ldp x29, x30, [sp], #32  <-- recover x29 and x30 (look's like values in x29, x30 was used in the fuction who called main)
  400348:   d65f03c0    ret
  40034c:   d503201f    nop

我用 <-- 标记添加了我的理解.有人可以看到代码并给我一些更正吗?任何小的评论将不胜感激.(请看

)

I added my understanding with <-- mark. Could someone see the code and give me some corrections? Any small comment will be appreciated. (please see from <main>)

添加:感谢您的评论.我想我忘记问我真正的问题了.在 main 的开始,调用 main 的程序应该把它的返回地址(在 main 之后)放在 x30 中.由于 main 应该调用另一个函数本身,它应该修改 x30,因此它将 x30 保存在其堆栈中.但是为什么它将它存储在 sp - #0x20 中?为什么变量 x,y,z 存储在 sp + #20, sp + #24, sp + #28 中?如果主函数调用 printf,我猜 sp 和 x29 会减少一些.这个数量取决于被调用的函数(这里是 printf)使用了多少堆栈区域?或者它是恒定的?以及如何确定 main 中的 x29、x30 存储位置?是否确定这两个值位于被调用函数(printf)的堆栈区域上方?抱歉问题太多.

ADD : Thanks for the comments. I think I forget to ask my real questions. At the start of main, the program who called main should have put it's return address(after main) in x30. And since main should call another function itself, it should modify x30, so it saves x30 in its stack. But why does it store it in sp - #0x20? and why are the variables x,y,z stored in sp + #20, sp + #24, sp + #28? If the main function calls printf, I guess sp and x29 will be decremented by some amount. Is this amount dependent on how much stack area the called function(here printf) uses? or is it constant? and how is the x29, x30 storage location in main determined? Is it determined so that those two values are located just above the called function(printf)'s stack area? Sorry for too many questions.

推荐答案

在为 main 布置堆栈时,编译器必须满足以下约束:

In laying out the stack for main, the compiler has to satisfy the following constraints:

  • x29x30 需要保存在堆栈中.它们每个占用 8 个字节.

  • x29 and x30 need to be saved on the stack. They occupy 8 bytes each.

局部变量x,y,z需要栈空间,每个4字节.(如果您正在优化,您会看到它们被保存在寄存器中,或者完全不存在优化.)这使我们总共有 8+8+4+4+4=28 字节.

The local variables x,y,z need stack space, 4 bytes each. (If you were optimizing, you'd see them kept in registers instead, or optimized completely out of existence.) That brings us to a total of 8+8+4+4+4=28 bytes.

栈指针sp必须始终保持对齐16字节;这是架构和 ABI 约束(操作系统可以选择放宽此要求,但通常不会).所以我们不能只从 sp 中减去 28;我们必须四舍五入到 16 的下一个倍数,即 32.

The stack pointer sp must always be kept aligned to 16 bytes; this is an architectural and ABI constraint (the OS can choose to relax this requirement but normally doesn't). So we can't just subtract 28 from sp; we must round up to the next multiple of 16, which is 32.

这就是您提到的 32 或 0x20 的来源.请注意,它完全用于 main 本身使用的堆栈内存.它不是一个普遍的常数;如果您从 main 中添加或删除了足够多的局部变量,您会看到它发生了变化.

So that's where the 32 or 0x20 that you mention comes from. Note that it is entirely for stack memory used by main itself. It's not a universal constant; you would see it change if you added or removed enough local variables from main.

它与 printf 需要什么无关.如果 printf 需要堆栈空间来存放它自己的局部变量,那么 printf 中的代码将不得不相应地调整堆栈指针.编译 main 时的编译器不知道会有多少空间,也不关心.

It has nothing to do with whatever printf needs. If printf needs stack space for its own local variables, then the code within printf will have to take care of adjusting the stack pointer accordingly. The compiler when compiling main does not know how much space that would be, and does not care.

现在编译器需要在它将为自己创建的 32 字节堆栈空间内组织这五个对象 x29, x30, x, y, z.除了以下几点外,选择放在哪里几乎是完全任意的.

Now the compiler needs to organize these five objects x29, x30, x, y, z within the 32 bytes of stack space that it will create for itself. The choice of what to put where could be almost completely arbitrary, except for the following point.

该函数的序言需要从堆栈指针中减去 32,并将寄存器 x29, x30 存储在分配空间内的某处.这一切都可以在带有预索引存储对指令 stp x29, x30, [sp, #-32]! 的单个指令中完成.它从sp中减去32,然后将x29x30存储在从地址开始的16个字节中>sp 现在点.所以为了使用这个指令,我们必须接受将 x29x30 放在分配空间的底部,偏移量 [sp+0][sp+8] 相对于 spnew 值.将它们放在其他任何地方需要额外的说明并且效率较低.

The function's prologue needs to both subtract 32 from the stack pointer, and store the registers x29, x30 somewhere within the allocated space. This can all be done in a single instruction with the pre-indexed store-pair instruction stp x29, x30, [sp, #-32]!. It subtracts 32 from sp, then stores x29 and x30 in the 16 bytes starting at the address where sp now points. So in order to use this instruction, we have to accept placing x29 and x30 at the bottom of the allocated space, at offsets [sp+0] and [sp+8] relative to the new value of sp. Putting them anywhere else would require extra instructions and be less efficient.

(实际上,因为这是最方便的方法,ABI 实际上要求以这种方式设置堆栈帧,x29, x30 在堆栈上以该顺序连续,当它们被使用时(5.2.3).

(Actually, because this is the most convenient way to do it, the ABI actually requires that stack frames be set up this way, with x29, x30 contiguous on the stack in that order, when they are used at all (5.2.3).)

我们还有从 [sp+16] 开始的 16 个字节可以使用,其中必须放置 x,y,z.编译器选择将它们分别放在地址[sp+28]、[sp+24]、[sp+20].[sp+16] 处的 4 个字节仍未使用,但请记住,我们必须在某处浪费 4 个字节才能实现正确的堆栈对齐.安排这些对象的选择,以及哪个插槽不使用,完全是任意的,任何其他安排也一样有效.

We still have 16 bytes starting at [sp+16] to play with, in which x,y,z must be placed. The compiler has chosen to put them at addresses [sp+28], [sp+24], [sp+20] respectively. The 4 bytes at [sp+16] remain unused, but remember, we had to waste 4 bytes somewhere in order to achieve the proper stack alignment. The choice of arranging these objects, and which slot to leave unused, was completely arbitrary and any other arrangement would have worked just as well.

这篇关于理解aarch64汇编函数调用,栈是如何操作的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆