为什么ARM gcc会在函数的开头将r3和lr注册为堆栈? [英] Why ARM gcc push register r3 and lr into stack at the beginning of a function?

查看:315
本文介绍了为什么ARM gcc会在函数的开头将r3和lr注册为堆栈?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试编写一个简单的测试代码(main.c):

  main.c 
void test(){
}
void main(){
test();
}

然后我用arm-non-eabi-gcc编译objdump得到汇编代码:

  arm-none-eabi-gcc -g -fno-defer-pop -fomit-frame-pointer  - c main.c 
arm-none-eabi-objdump -S main.o>输出

汇编代码将会推送r3和lr寄存器,甚至该函数也不会执行任何操作。

  main.o:文件格式elf32-littlearm 

反汇编section .text:

00000000<试验计算值:
空隙测试(){
}
0:e12fff1e BX LR

00000004<主计算值:
无效的主要(){
4:e92d4008 push {r3,lr}
test();
8:ebfffffe bl 0< test>
}
c:e8bd4008 pop {r3,lr}
10:e12fff1e bx lr

我的问题是为什么arm gcc选择将r3推入堆栈,即使test()函数从不使用它? gcc只是随便选择1个寄存器来推?
如果堆栈对齐(ARM为8个字节)要求,为什么不减少sp?谢谢。



==================更新============== ============



@KemyLand为了您的答案,我还有另外一个例子:
源代码是:

  void test1(){
}
void test(int i){
test1();
}
void main(){
test(1);
}

我使用上面的相同编译命令,然后得到以下程序集:

  main.o:文件格式elf32-littlearm 


反汇编section .text:

00000000 LT; TEST1计算值:
空隙TEST1(){
}
0:e12fff1e BX LR

00000004<试验计算值:
void test(int i){
4:e52de004 push {lr}; (str lr,[sp,#-4]!)
8:e24dd00c sub sp,sp,#12
c:e58d0004 str r0,[sp,#4]
test1();
10:ebfffffe bl 0< test1>
}
14:e28dd00c add sp,sp,#12
18:e49de004 pop {lr}; (LDR LR,[SP],#4)
1C:e12fff1e BX LR

00000020<主计算值:
无效的主要(){
20:e92d4008 push {r3,lr}
test(1);
24:e3a00001 mov r0,#1
28:ebfffffe bl 4< test>
}
2c:e8bd4008 pop {r3,lr}
30:e12fff1e bx lr

如果第一个例子中的push {r3,lr}是为了使用更少的指令,为什么在这个函数test()中,编译器不只是使用一条指令?

  push {r0,lr} 

它使用3条指令而不是1条。

  push {lr} 
sub sp,sp#12
str r0,[sp,#4]

顺便说一下,是8字节对齐的,它可以直接用4分吗? //infocenter.arm.com/help/topic/com.arm.doc.ihi0036b/IHI0036B_bsabi.pdf的rel = noreferrer>标准ARM嵌入式ABI , R0 r3 用于将参数传递给函数及其返回值,同时 lr (又名: r14 )是链接寄存器,其用途是保存一个函数的返回地址。

显然必须保存 lr ,否则 main()将无法返回给它的调用者。



现在臭名昭着的提到每一个ARM指令占用32位,如您所述,ARM具有8字节的调用堆栈对齐要求。而且,作为奖励,我们使用嵌入式ARM ABI,因此代码大小应该优化。因此,使用单个32位指令可以节省 lr ,并通过推送一个未使用的寄存器( r3 不需要,因为 test()不带参数,也不返回任何内容),然后弹出一个32位指令,而不是添加更多的指令(并因此浪费宝贵的内存!)来操纵堆栈指针。



总而言之,这只是GCC的一项优化。

I tried to write a simple test code like this(main.c):

main.c
void test(){
}
void main(){
    test();
}

Then I used arm-non-eabi-gcc to compile and objdump to get the assembly code:

arm-none-eabi-gcc -g -fno-defer-pop -fomit-frame-pointer -c main.c
arm-none-eabi-objdump -S main.o > output

The assembly code will push r3 and lr registers, even the function did nothing.

main.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <test>:
void test(){
}
   0:   e12fff1e        bx      lr

00000004 <main>:
void main(){
   4:   e92d4008        push    {r3, lr}
        test();
   8:   ebfffffe        bl      0 <test>
}
   c:   e8bd4008        pop     {r3, lr}
  10:   e12fff1e        bx      lr

My question is why arm gcc choose to push r3 into stack, even test() function never use it? Does gcc just random choose 1 register to push? If it's for the stack aligned(8 bytes for ARM) requirement, why not just subtract the sp? Thanks.

==================Update==========================

@KemyLand For your answer, I have another example: The source code is:

void test1(){
}
void test(int i){
        test1();
}
void main(){
        test(1);
}

I use the same compile command above, then get the following assembly:

main.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <test1>:
void test1(){
}
   0:   e12fff1e        bx      lr

00000004 <test>:
void test(int i){
   4:   e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
   8:   e24dd00c        sub     sp, sp, #12
   c:   e58d0004        str     r0, [sp, #4]
        test1();
  10:   ebfffffe        bl      0 <test1>
}
  14:   e28dd00c        add     sp, sp, #12
  18:   e49de004        pop     {lr}            ; (ldr lr, [sp], #4)
  1c:   e12fff1e        bx      lr

00000020 <main>:
void main(){
  20:   e92d4008        push    {r3, lr}
        test(1);
  24:   e3a00001        mov     r0, #1
  28:   ebfffffe        bl      4 <test>
}
  2c:   e8bd4008        pop     {r3, lr}
  30:   e12fff1e        bx      lr

If push {r3, lr} in first example is for use less instructions, why in this function test(), the compiler didn't just using one instruction?

push {r0, lr}

It use 3 instructions instead of 1.

push {lr}
sub sp, sp #12
str r0, [sp, #4]

By the way, why it sub sp with 12, the stack is 8-bytes aligned, it can just sub it with 4 right?

解决方案

According to the Standard ARM Embedded ABI, r0 through r3 are used to pass the arguments to a function, and the return value thereof, meanwhile lr (a.k.a: r14) is the link register, whose purpose is to hold the return address for a function.

It's obvious that lr must be saved, as otherwise main() would have no way to return to its caller.

It's now notorious to mention that every single ARM instruction takes 32 bits, and as you mentioned, ARM has a call stack alignment requirement of 8 bytes. And, as a bonus, we're using the Embedded ARM ABI, so code size shall be optimized. Thus, it's more efficient to have a single 32-bit instruction both saving lr and aligning the stack by pushing an unused register (r3 is not needed, because test() does not take arguments nor it returns anything), and then pop in a single 32-bit instruction, rather than adding more instructions (and thus, wasting precious memory!) to manipulate the stack pointer.

After all, it's pretty logical to conclude this is just an optimization from GCC.

这篇关于为什么ARM gcc会在函数的开头将r3和lr注册为堆栈?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆