为什么 GCC 在堆栈上推送一个额外的返回地址? [英] Why is GCC pushing an extra return address on the stack?

查看:23
本文介绍了为什么 GCC 在堆栈上推送一个额外的返回地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在学习汇编的基础知识,在查看 GCC(6.1.1) 生成的指令时遇到了一些奇怪的问题.

I am currently learning the basics of assembly and came across something odd when looking at the instructions generated by GCC(6.1.1).

这是来源:

#include <stdio.h>

int foo(int x, int y){
    return x*y;
}

int main(){
    int a = 5;
    int b = foo(a, 0xF00D);
    printf("0x%X
", b);
    return 0;
}

用于编译的命令:gcc -m32 -g test.c -o test

在检查 GDB 中的函数时,我得到了这个:

When examining the functions in GDB I get this:

(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
   0x080483f7 <+0>:     lea    ecx,[esp+0x4]
   0x080483fb <+4>:     and    esp,0xfffffff0
   0x080483fe <+7>:     push   DWORD PTR [ecx-0x4]
   0x08048401 <+10>:    push   ebp
   0x08048402 <+11>:    mov    ebp,esp
   0x08048404 <+13>:    push   ecx
   0x08048405 <+14>:    sub    esp,0x14
   0x08048408 <+17>:    mov    DWORD PTR [ebp-0xc],0x5
   0x0804840f <+24>:    push   0xf00d
   0x08048414 <+29>:    push   DWORD PTR [ebp-0xc]
   0x08048417 <+32>:    call   0x80483eb <foo>
   0x0804841c <+37>:    add    esp,0x8
   0x0804841f <+40>:    mov    DWORD PTR [ebp-0x10],eax
   0x08048422 <+43>:    sub    esp,0x8
   0x08048425 <+46>:    push   DWORD PTR [ebp-0x10]
   0x08048428 <+49>:    push   0x80484d0
   0x0804842d <+54>:    call   0x80482c0 <printf@plt>
   0x08048432 <+59>:    add    esp,0x10
   0x08048435 <+62>:    mov    eax,0x0
   0x0804843a <+67>:    mov    ecx,DWORD PTR [ebp-0x4]
   0x0804843d <+70>:    leave  
   0x0804843e <+71>:    lea    esp,[ecx-0x4]
   0x08048441 <+74>:    ret    
End of assembler dump.
(gdb) disas foo
Dump of assembler code for function foo:
   0x080483eb <+0>:     push   ebp
   0x080483ec <+1>:     mov    ebp,esp
   0x080483ee <+3>:     mov    eax,DWORD PTR [ebp+0x8]
   0x080483f1 <+6>:     imul   eax,DWORD PTR [ebp+0xc]
   0x080483f5 <+10>:    pop    ebp
   0x080483f6 <+11>:    ret    
End of assembler dump.

让我困惑的部分是它试图用堆栈做什么.根据我的理解,这就是它的作用:

The part that confuses me is what it is trying to do with the stack. From my understanding this is what it does:

  1. 它需要引用堆栈中高于 4 个字节的内存地址,据我所知,这应该是传递给 main 的变量,因为 esp 当前指向内存中的返回地址.
  2. 出于性能原因,它将堆栈与 0 边界对齐.
  3. 它压入新的堆栈区域ecx+4,这应该转化为将我们假设要返回的地址压入堆栈.
  4. 它将旧的帧指针压入堆栈并设置新的.
  5. 它将 ecx(它仍然指向应该是 main 的参数)压入堆栈.
  1. It takes a reference to some memory address 4 bytes higher in the stack which from my knowledge should be the variables passed to main since esp currently pointed to the return address in memory.
  2. It aligns the stack to a 0 boundary for performance reasons.
  3. It pushes onto the new stack area ecx+4 which should translate to pushing the address we are suppose to be returning to on the stack.
  4. It pushes the old frame pointer onto the stack and sets up the new one.
  5. It pushes ecx (which is still pointing to would should be an argument to main) onto the stack.

然后程序做它应该做的事情并开始返回的过程:

Then the program does what it should and begins the process of returning:

  1. 它通过在应该访问第一个局部变量的 ebp 上使用 -0x4 偏移量来恢复 ecx.
  2. 它执行 leave 指令,实际上只是将 esp 设置为 ebp,然后从堆栈中弹出 ebp.
  1. It restores ecx by using a -0x4 offset on ebp which should access the first local variable.
  2. It executes the leave instruction which really just sets esp to ebp and then pops ebp from the stack.

所以现在堆栈上的下一个是返回地址,esp 和 ebp 寄存器应该回到他们需要返回的位置?

So now the next thing on the stack is the return address and the esp and ebp registers should be back to what they need to be to return right?

显然不是因为它接下来要做的是用 ecx-0x4 加载 esp ,因为 ecx 仍然指向传递的那个变量to main 应该把它放在栈上返回地址的地址.

Well evidently not because the next thing it does is load esp with ecx-0x4 which since ecx is still pointing to that variable passed to main should put it at the address of return address on the stack.

这工作得很好,但提出了一个问题:为什么在步骤 3 中将返回地址放入堆栈中,因为它在真正从函数返回之前将堆栈返回到末尾的原始位置?

This works just fine but raises the question: why did it bother to put the return address onto the stack in step 3 since it returned the stack to the original position at the end just before actually returning from the function?

推荐答案

更新:gcc8 至少对正常用例(-fomit-frame-pointer,没有 alloca 或需要可变大小分配的 C99 VLA).可能是因为越来越多地使用 AVX,导致更多函数需要 32 字节对齐的本地或数组.

Update: gcc8 simplifies this at least for normal use-cases (-fomit-frame-pointer, and no alloca or C99 VLAs that require variable-size allocation). Perhaps motivated by increasing usage of AVX leading to more functions wanting a 32-byte aligned local or array.

此外,可能是 当 gcc 需要额外的堆栈对齐时,它的奇怪堆栈操作是怎么回事?

这个复杂的序言如果只运行几次就很好(例如在 32 位代码中的 main 的开头),但它看起来越多,优化它就越值得.GCC 有时仍然会在函数中过度对齐堆栈,其中所有 >16 字节对齐的对象都被优化到寄存器中,这已经是一个错过的优化,但当堆栈对齐更便宜时就不那么糟糕了.

This complicated prologue is fine if it only ever runs a couple times (e.g. at the start of main in 32-bit code), but the more it appears the more worthwhile it is to optimize it. GCC sometimes still over-aligns the stack in functions where all >16-byte aligned objects are optimized into registers, which is a missed optimization already but less bad when the stack alignment is cheaper.

gcc 在函数内对齐堆栈时会产生一些笨拙的代码,即使启用了优化.我有一个可能的理论(见下文),关于为什么 gcc 可能会将返回地址复制到它保存 ebp 的正上方以制作堆栈帧(是的,我同意这就是 gcc 正在做的).在这个函数中看起来没有必要,clang 没有做类似的事情.

gcc makes some clunky code when aligning the stack within a function, even with optimization enabled. I have a possible theory (see below) on why gcc might be copying the return address to just above where it saves ebp to make a stack frame (and yes, I agree that's what gcc is doing). It doesn't look necessary in this function, and clang doesn't do anything like that.

除此之外,ecx 的废话可能只是 gcc 没有优化掉它的 align-the-stack 样板中不需要的部分.(需要 esp 的预对齐值来引用堆栈上的 args,因此将第一个可能的 arg 的地址放入寄存器是有意义的).

Besides that, the nonsense with ecx is probably just gcc not optimizing away unneeded parts of its align-the-stack boilerplate. (The pre-alignment value of esp is needed to reference args on the stack, so it makes sense that it puts the address of the first would-be arg into a register).

您在 32 位代码中使用优化看到了同样的事情(其中 gcc 使 main 不假定 16B 堆栈对齐,即使当前版本的ABI 要求在进程启动时,调用 main 的 CRT 代码要么对齐堆栈本身,要么保留内核提供的初始对齐,我忘了).您还可以在将堆栈对齐到 16B 以上的函数中看到这一点(例如,使用 __m256 类型的函数,有时即使它们从不将它们溢出到堆栈中.或者使用 C++ 声明的数组的函数11 alignas(32),或任何其他请求对齐的方式.)在 64 位代码中,gcc 似乎总是为此使用 r10,而不是 rcx.

You see the same thing with optimization in 32-bit code (where gcc makes a main that doesn't assume 16B stack alignment, even though the current version of the ABI requires that at process startup, and the CRT code that calls main either aligns the stack itself or preserves the initial alignment provided by the kernel, I forget). You also see this in functions that align the stack to more than 16B (e.g. functions that use __m256 types, sometimes even if they never spill them to the stack. Or functions with an array declared with C++11 alignas(32), or any other way of requesting alignment.) In 64-bit code, gcc always seems to use r10 for this, not rcx.

gcc 的做法不需要 ABI 合规性,因为 clang 做的事情要简单得多.

There's nothing required for ABI compliance about the way gcc does it, because clang does something much simpler.

我添加了一个对齐的变量(使用 volatile 作为一种简单的方法来强制编译器在堆栈上实际为其保留对齐的空间,而不是将其优化掉).我把你的代码 在 Godbolt 编译器浏览器上,使用 -O3 查看 asm.我在 gcc 4.9、5.3 和 6.1 中看到了相同的行为,但在 clang 中看到了不同的行为.

I added an aligned variable (with volatile as a simple way to force the compiler to actually reserve aligned space for it on the stack, instead of optimizing it away). I put your code on the Godbolt compiler explorer, to look at the asm with -O3. I see the same behaviour from gcc 4.9, 5.3, and 6.1, but different behaviour with clang.

int main(){
    __attribute__((aligned(32))) volatile int v = 1;
    return 0;
}

Clang3.8 的 -O3 -m32 输出在功能上与其 -m64 输出相同.请注意,-O3 启用了 -fomit-frame-pointer,但某些函数无论如何都会生成堆栈帧.

Clang3.8's -O3 -m32 output is functionally identical to its -m64 output. Note that -O3 enables -fomit-frame-pointer, but some functions make stack frames anyway.

    push    ebp
    mov     ebp, esp                # make a stack frame *before* aligning, so ebp-relative addressing can only access stack args, not aligned locals.
    and     esp, -32
    sub     esp, 32                 # esp is 32B aligned with 32 or 48B above esp reserved (depending on incoming alignment)
    mov     dword ptr [esp], 1      # store v
    xor     eax, eax                # return 0
    mov     esp, ebp                # leave
    pop     ebp
    ret

gcc 的输出在 -m32-m64 之间几乎相同,但它把 v 放在 使用 -m64 所以-m32 输出有两个额外的指令:

gcc's output is nearly the same between -m32 and -m64, but it puts v in the red-zone with -m64 so the -m32 output has two extra instructions:

    # gcc 6.1 -m32 -O3 -fverbose-asm.  Most of gcc's comment lines are empty.  I guess that means it has no idea why it's emitting those insns :P
    lea     ecx, [esp+4]      #,   get a pointer to where the first arg would be
    and     esp, -32  #,          align
    xor     eax, eax  #           return 0
    push    DWORD PTR [ecx-4]       #  No clue WTF this is for; this looks batshit insane, but happens even in 64bit mode.
    push    ebp     #             make a stackframe, even though -fomit-frame-pointer is on by default and we can already restore the original esp from ecx (unlike clang)
    mov     ebp, esp  #,
    push    ecx     #             save the old esp value (even though this function doesn't clobber ecx...)
    sub     esp, 52   #,          reserve space for v  (not present with -m64)
    mov     DWORD PTR [ebp-56], 1     # v,
    add     esp, 52   #,          unreserve (not present with -m64)
    pop     ecx       #           restore ecx (even though nothing clobbered it)
    pop     ebp       #           at least it knows it can just pop instead of `leave`
    lea     esp, [ecx-4]      #,  restore pre-alignment esp
    ret

似乎gcc想要在对齐堆栈之后制作它的堆栈帧(使用push ebp).我想这是有道理的,所以它可以引用相对于 ebp 的局部变量.否则,它必须使用 esp 相对寻址,如果它想要对齐的局部变量.

It seems that gcc wants to make its stack frame (with push ebp) after aligning the stack. I guess that makes sense, so it can reference locals relative to ebp. Otherwise it would have to use esp-relative addressing, if it wanted aligned locals.

对齐后但在推送ebp之前的返回地址的额外副本意味着返回地址被复制到相对于保存的ebp值的预期位置(以及调用子函数时 ebp 中的值).因此,这确实有助于希望通过跟踪堆栈帧的链表并查看返回地址以找出所涉及的函数来展开堆栈的代码.

The extra copy of the return address after aligning but before pushing ebp means that the return address is copied to the expected place relative to the saved ebp value (and the value that will be in ebp when child functions are called). So this does potentially help code that wants to unwind the stack by following the linked list of stack frames, and looking at return-addresses to find out what function is involved.

我不确定这是否与允许使用 -fomit-frame-pointer 进行堆栈展开(回溯/异常处理)的现代堆栈展开信息有关.(它是 .eh_frame 部分中的元数据.这是围绕 esp 的每个修改的 .cfi_* 指令的用途.)我应该看看当必须在非叶函数中对齐堆栈时,clang 会做什么.

I'm not sure whether this matters with modern stack-unwind info that allows stack-unwinding (backtraces / exception handling) with -fomit-frame-pointer. (It's metadata in the .eh_frame section. This is what the .cfi_* directives around every modification to esp are for.) I should look at what clang does when it has to align the stack in a non-leaf function.

在函数内部需要 esp 的原始值来引用堆栈上的函数参数.我认为 gcc 不知道如何优化掉它的 align-the-stack 方法中不需要的部分.(例如 out main 不查看其参数(并声明不采用任何参数))

The original value of esp would be needed inside the function to reference function args on the stack. I think gcc doesn't know how to optimize away unneeded parts of its align-the-stack method. (e.g. out main doesn't look at its args (and is declared not to take any))

这种代码生成是您在需要对齐堆栈的函数中看到的典型代码;使用具有自动存储功能的 volatile 并不奇怪.

This kind of code-gen is typical of what you see in a function that needs to align the stack; it's not extra weird because of using a volatile with automatic storage.

这篇关于为什么 GCC 在堆栈上推送一个额外的返回地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆