为什么 x86-64 GCC 函数序言分配的堆栈比局部变量少? [英] Why does the x86-64 GCC function prologue allocate less stack than the local variables?

查看:20
本文介绍了为什么 x86-64 GCC 函数序言分配的堆栈比局部变量少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑下面的简单程序:

int main(int argc, char **argv)
{
        char buffer[256];

        buffer[0] = 0x41;
        buffer[128] = 0x41;
        buffer[255] = 0x41;

        return 0;
}

在 x86-64 机器上用 GCC 4.7.0 编译.用 GDB 反汇编 main() 给出:

Compiled with GCC 4.7.0 on a x86-64 machine. Disassembly of main() with GDB gives:

0x00000000004004cc <+0>:     push   rbp
0x00000000004004cd <+1>:     mov    rbp,rsp
0x00000000004004d0 <+4>:     sub    rsp,0x98
0x00000000004004d7 <+11>:    mov    DWORD PTR [rbp-0x104],edi
0x00000000004004dd <+17>:    mov    QWORD PTR [rbp-0x110],rsi
0x00000000004004e4 <+24>:    mov    BYTE PTR [rbp-0x100],0x41
0x00000000004004eb <+31>:    mov    BYTE PTR [rbp-0x80],0x41
0x00000000004004ef <+35>:    mov    BYTE PTR [rbp-0x1],0x41
0x00000000004004f3 <+39>:    mov    eax,0x0
0x00000000004004f8 <+44>:    leave  
0x00000000004004f9 <+45>:    ret    

当缓冲区为 256 字节时,为什么它只用 0x98 = 152d 代替 rsp?当我将数据移动到缓冲区 [0] 时,它似乎只是使用分配的堆栈帧之外的数据并使用 rbp 进行引用,那么 sub rsp,0x98 的意义何在?

Why does it sub rsp with only 0x98 = 152d when the buffer is 256 byte? When I mov data into buffer[0] it simply seems to use data outside of the allocated stack frame and use rbp to reference, so what is even the point of the sub rsp,0x98?

另一个问题,这些线有什么作用?

Another question, what do these lines do?

0x00000000004004d7 <+11>:    mov    DWORD PTR [rbp-0x104],edi
0x00000000004004dd <+17>:    mov    QWORD PTR [rbp-0x110],rsi

为什么需要保存 EDI 而不是 RDI?但是,我看到它将它移到 C 代码中分配的缓冲区的最大范围之外.同样令人感兴趣的是为什么两个变量之间的增量如此之大.既然EDI只有4个字节,为什么两个变量需要12个字节的分隔?

Why does EDI and not RDI need to be saved? I see that it moves this outside of the maximum range of the allocated buffer in the C code however. Also of interest is why the delta between the two variables is so big. Since EDI is just 4 bytes, why does it need a 12 byte separation for the two variables?

推荐答案

Linux 使用的 x86-64 ABI(和其他一些操作系统,虽然明显不是 Windows,它有自己不同的 ABI)在堆栈指针下方定义了一个 128 字节的红色区域",保证不被信号或中断处理程序.(参见图 3.3 和 §3.2.2.)

The x86-64 ABI used by Linux (and some other OSes, although notably not Windows, which has its own different ABI) defines a "red zone" of 128 bytes below the stack pointer, which is guaranteed not to be touched by signal or interrupt handlers. (See figure 3.3 and §3.2.2.)

叶函数(即不调用其他任何东西的函数)因此可以将这个区域用于它想要的任何东西——它不会像 call 那样做任何事情,它将数据放在堆栈指针上;并且任何信号或中断处理程序都将遵循 ABI 并在存储任何内容之前将堆栈指针至少额外减少 128 个字节.

A leaf function (i.e. one which does not call anything else) may therefore use this area for whatever it wants - it isn't doing anything like a call which would place data at the stack pointer; and any signal or interrupt handler will follow the ABI and drop the stack pointer by at least an additional 128 bytes before storing anything.

(较短的指令编码可用于有符号的 8 位位移,因此红色区域的重点是它增加了叶函数可以使用这些较短的指令访问的本地数据量.)

(Shorter instruction encodings are available for signed 8-bit displacements, so the point of the red zone is that it increases the amount of local data that a leaf function can access using these shorter instructions.)

这就是这里发生的事情.

That's what's happening here.

但是……这段代码没有使用那些较短的编码(它使用的是来自 rbp 而不是 rsp 的偏移量).为什么不?它还完全不必要地保存 edirsi - 你问为什么它保存 edi 而不是 rdi,但为什么是它保存它吗?

But... this code isn't making use of those shorter encodings (it's using offsets from rbp rather than rsp). Why not? It's also saving edi and rsi completely unnecessarily - you ask why it's saving edi instead of rdi, but why is it saving it at all?

答案是编译器正在生成非常糟糕的代码,因为没有启用优化.如果您启用任何优化,您的整个功能可能会崩溃到:

The answer is that the compiler is generating really crummy code, because no optimisations are enabled. If you enable any optimisation, your entire function is likely to collapse down to:

mov eax, 0
ret

因为这就是它真正需要做的:buffer[] 是本地的,所以对它所做的更改永远不会被其他任何东西看到,所以可以优化掉;除此之外,所有函数需要做的就是返回 0.

because that's really all it needs to do: buffer[] is local, so the changes made to it will never be visible to anything else, so can be optimised away; beyond that, all the function needs to do is return 0.

所以,这是一个更好的例子.这个函数完全是胡说八道,但使用了一个类似的数组,同时做了足够多的事情来确保一切都不会被优化:

So, here's a better example. This function is complete nonsense, but makes use of a similar array, whilst doing enough to ensure that things don't all get optimised away:

$ cat test.c
int foo(char *bar)
{
    char tmp[256];
    int i;

    for (i = 0; bar[i] != 0; i++)
      tmp[i] = bar[i] + i;

    return tmp[1] + tmp[200];
}

经过一些优化编译,可以看到红色区域的类似用法,除了这次它确实使用了 rsp 的偏移量:

Compiled with some optimisation, you can see similar use of the red zone, except this time it really does use offsets from rsp:

$ gcc -m64 -O1 -c test.c
$ objdump -Mintel -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   53                      push   rbx
   1:   48 81 ec 88 00 00 00    sub    rsp,0x88
   8:   0f b6 17                movzx  edx,BYTE PTR [rdi]
   b:   84 d2                   test   dl,dl
   d:   74 26                   je     35 <foo+0x35>
   f:   4c 8d 44 24 88          lea    r8,[rsp-0x78]
  14:   48 8d 4f 01             lea    rcx,[rdi+0x1]
  18:   4c 89 c0                mov    rax,r8
  1b:   89 c3                   mov    ebx,eax
  1d:   44 28 c3                sub    bl,r8b
  20:   89 de                   mov    esi,ebx
  22:   01 f2                   add    edx,esi
  24:   88 10                   mov    BYTE PTR [rax],dl
  26:   0f b6 11                movzx  edx,BYTE PTR [rcx]
  29:   48 83 c0 01             add    rax,0x1
  2d:   48 83 c1 01             add    rcx,0x1
  31:   84 d2                   test   dl,dl
  33:   75 e6                   jne    1b <foo+0x1b>
  35:   0f be 54 24 50          movsx  edx,BYTE PTR [rsp+0x50]
  3a:   0f be 44 24 89          movsx  eax,BYTE PTR [rsp-0x77]
  3f:   8d 04 02                lea    eax,[rdx+rax*1]
  42:   48 81 c4 88 00 00 00    add    rsp,0x88
  49:   5b                      pop    rbx
  4a:   c3                      ret    

<小时>

现在让我们稍微调整一下,通过插入对另一个函数的调用,这样 foo() 不再是叶函数:

$ cat test.c
extern void dummy(void);  /* ADDED */

int foo(char *bar)
{
    char tmp[256];
    int i;

    for (i = 0; bar[i] != 0; i++)
      tmp[i] = bar[i] + i;

    dummy();  /* ADDED */

    return tmp[1] + tmp[200];
}

现在红色区域无法使用,所以你看到的东西更像你最初预期:

Now the red zone cannot be used, so you see something more like you originally expected:

$ gcc -m64 -O1 -c test.c
$ objdump -Mintel -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   53                      push   rbx
   1:   48 81 ec 00 01 00 00    sub    rsp,0x100
   8:   0f b6 17                movzx  edx,BYTE PTR [rdi]
   b:   84 d2                   test   dl,dl
   d:   74 24                   je     33 <foo+0x33>
   f:   49 89 e0                mov    r8,rsp
  12:   48 8d 4f 01             lea    rcx,[rdi+0x1]
  16:   48 89 e0                mov    rax,rsp
  19:   89 c3                   mov    ebx,eax
  1b:   44 28 c3                sub    bl,r8b
  1e:   89 de                   mov    esi,ebx
  20:   01 f2                   add    edx,esi
  22:   88 10                   mov    BYTE PTR [rax],dl
  24:   0f b6 11                movzx  edx,BYTE PTR [rcx]
  27:   48 83 c0 01             add    rax,0x1
  2b:   48 83 c1 01             add    rcx,0x1
  2f:   84 d2                   test   dl,dl
  31:   75 e6                   jne    19 <foo+0x19>
  33:   e8 00 00 00 00          call   38 <foo+0x38>
  38:   0f be 94 24 c8 00 00    movsx  edx,BYTE PTR [rsp+0xc8]
  3f:   00 
  40:   0f be 44 24 01          movsx  eax,BYTE PTR [rsp+0x1]
  45:   8d 04 02                lea    eax,[rdx+rax*1]
  48:   48 81 c4 00 01 00 00    add    rsp,0x100
  4f:   5b                      pop    rbx
  50:   c3                      ret    

(注意 tmp[200] 在第一种情况下在有符号的 8 位位移范围内,但不在这种情况下.)

(Note that tmp[200] was in range of a signed 8-bit displacement in the first case, but is not in this one.)

这篇关于为什么 x86-64 GCC 函数序言分配的堆栈比局部变量少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆