为什么X86-64 GCC函数序言比局部变量分配较少的堆栈? [英] Why does the x86-64 GCC function prologue allocate less stack than the local variables?

查看:248
本文介绍了为什么X86-64 GCC函数序言比局部变量分配较少的堆栈?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑下面的简单程序:

int main(int argc, char **argv)
{
        char buffer[256];

        buffer[0] = 0x41;
        buffer[128] = 0x41;
        buffer[255] = 0x41;

        return 0;
}

一x86-64的机器上使用GCC编译4.7.0。主要拆卸()与GDB给出了:

Compiled with GCC 4.7.0 on a x86-64 machine. Disassembly of main() with GDB gives:

0x00000000004004cc <+0>:     push   rbp
0x00000000004004cd <+1>:     mov    rbp,rsp
0x00000000004004d0 <+4>:     sub    rsp,0x98
0x00000000004004d7 <+11>:    mov    DWORD PTR [rbp-0x104],edi
0x00000000004004dd <+17>:    mov    QWORD PTR [rbp-0x110],rsi
0x00000000004004e4 <+24>:    mov    BYTE PTR [rbp-0x100],0x41
0x00000000004004eb <+31>:    mov    BYTE PTR [rbp-0x80],0x41
0x00000000004004ef <+35>:    mov    BYTE PTR [rbp-0x1],0x41
0x00000000004004f3 <+39>:    mov    eax,0x0
0x00000000004004f8 <+44>:    leave  
0x00000000004004f9 <+45>:    ret    

为什么它的子RSP只= 0x98在全局时152D缓冲区256字节?当我MOV数据到buffer [0]它只是似乎使用的数据分配的堆栈帧的外面,用RBP引用,那么什么是子RSP的连点,0x98在全局?​​

Why does it sub rsp with only 0x98 = 152d when the buffer is 256 byte? When I mov data into buffer[0] it simply seems to use data outside of the allocated stack frame and use rbp to reference, so what is even the point of the sub rsp,0x98?

另外一个问题,做这行呢?

Another question, what do these lines do?

0x00000000004004d7 <+11>:    mov    DWORD PTR [rbp-0x104],edi
0x00000000004004dd <+17>:    mov    QWORD PTR [rbp-0x110],rsi

为什么EDI,而不是RDI需要保存?我看不过是它的动作在C code此分配的缓冲区的最大范围之外。同样有趣的是,为什么这两个变量之间的变化是如此之大。由于EDI就是4个字节,为什么它需要两个变量一个12字节的分离?

Why does EDI and not RDI need to be saved? I see that it moves this outside of the maximum range of the allocated buffer in the C code however. Also of interest is why the delta between the two variables is so big. Since EDI is just 4 bytes, why does it need a 12 byte separation for the two variables?

推荐答案

ABI被Lin​​ux <使用/ A>(和其他一些操作系统,虽然显着的的视窗,它有自己不同的ABI)定义的堆栈指针,这是保证不被感动低于128字节的红番区信号或中断处理程序。 (参见图3.3和§3.2.2。)

The x86-64 ABI used by Linux (and some other OSes, although notably not Windows, which has its own different ABI) defines a "red zone" of 128 bytes below the stack pointer, which is guaranteed not to be touched by signal or interrupt handlers. (See figure 3.3 and §3.2.2.)

叶函数(即,一个不叫别的),因此可能使用该区域为所欲为 - 这是不是做像调用任何这将放置在堆栈指针数据;任何信号或中断处理程序将按照ABI和存储任何东西之前通过至少增加128个字节下降堆栈指针。

A leaf function (i.e. one which does not call anything else) may therefore use this area for whatever it wants - it isn't doing anything like a call which would place data at the stack pointer; and any signal or interrupt handler will follow the ABI and drop the stack pointer by at least an additional 128 bytes before storing anything.

(较短的指令编码可用于符号的8位的位移,所以红色区域的一点是,它增加了叶函数可以使用这些较短指令的本地的数据量。)

(Shorter instruction encodings are available for signed 8-bit displacements, so the point of the red zone is that it increases the amount of local data that a leaf function can access using these shorter instructions.)

这就是这里发生了什么。

That's what's happening here.

不过......这code未利用这些较短的编码(它使用从偏移量 RBP 而不是 RSP )。为什么不?它也节省了 EDI RSI 完全不必要的 - 你问为什么它节省了 EDI 而不是 RDI ,但为什么它保存它呢?

But... this code isn't making use of those shorter encodings (it's using offsets from rbp rather than rsp). Why not? It's also saving edi and rsi completely unnecessarily - you ask why it's saving edi instead of rdi, but why is it saving it at all?

答案是,编译器生成真的很糟糕code,因为没有任何的优化已启用。如果启用任何优化,你的整个功能是可能向下崩:

The answer is that the compiler is generating really crummy code, because no optimisations are enabled. If you enable any optimisation, your entire function is likely to collapse down to:

mov eax, 0
ret

因为这是真正的所有需要​​做的:缓冲[] 是本地的,所以它所做的更改将永远不会为任何其他可见的,所以可以被优化掉;除此之外,所有的功能,需要做的是返回0。

because that's really all it needs to do: buffer[] is local, so the changes made to it will never be visible to anything else, so can be optimised away; beyond that, all the function needs to do is return 0.

所以,这里有一个更好的例子。这个功能完全是胡说八道,而是利用一个类似的数组,而做得不够,以确保事情不都得到优化:

So, here's a better example. This function is complete nonsense, but makes use of a similar array, whilst doing enough to ensure that things don't all get optimised away:

$ cat test.c
int foo(char *bar)
{
    char tmp[256];
    int i;

    for (i = 0; bar[i] != 0; i++)
      tmp[i] = bar[i] + i;

    return tmp[1] + tmp[200];
}

有一些优化编译,你可以看到类似用途的红色区域,
只是这次它确实使用补偿从 RSP

$ gcc -m64 -O1 -c test.c
$ objdump -Mintel -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   53                      push   rbx
   1:   48 81 ec 88 00 00 00    sub    rsp,0x88
   8:   0f b6 17                movzx  edx,BYTE PTR [rdi]
   b:   84 d2                   test   dl,dl
   d:   74 26                   je     35 <foo+0x35>
   f:   4c 8d 44 24 88          lea    r8,[rsp-0x78]
  14:   48 8d 4f 01             lea    rcx,[rdi+0x1]
  18:   4c 89 c0                mov    rax,r8
  1b:   89 c3                   mov    ebx,eax
  1d:   44 28 c3                sub    bl,r8b
  20:   89 de                   mov    esi,ebx
  22:   01 f2                   add    edx,esi
  24:   88 10                   mov    BYTE PTR [rax],dl
  26:   0f b6 11                movzx  edx,BYTE PTR [rcx]
  29:   48 83 c0 01             add    rax,0x1
  2d:   48 83 c1 01             add    rcx,0x1
  31:   84 d2                   test   dl,dl
  33:   75 e6                   jne    1b <foo+0x1b>
  35:   0f be 54 24 50          movsx  edx,BYTE PTR [rsp+0x50]
  3a:   0f be 44 24 89          movsx  eax,BYTE PTR [rsp-0x77]
  3f:   8d 04 02                lea    eax,[rdx+rax*1]
  42:   48 81 c4 88 00 00 00    add    rsp,0x88
  49:   5b                      pop    rbx
  4a:   c3                      ret    


现在让我们来调整它非常轻微,通过插入调用另一个函数,
富()不再叶函数:

$ cat test.c
extern void dummy(void);  /* ADDED */

int foo(char *bar)
{
    char tmp[256];
    int i;

    for (i = 0; bar[i] != 0; i++)
      tmp[i] = bar[i] + i;

    dummy();  /* ADDED */

    return tmp[1] + tmp[200];
}

现在的红色区域不能使用,所以你看到的东西更喜欢你
原本预计:

Now the red zone cannot be used, so you see something more like you originally expected:

$ gcc -m64 -O1 -c test.c
$ objdump -Mintel -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   53                      push   rbx
   1:   48 81 ec 00 01 00 00    sub    rsp,0x100
   8:   0f b6 17                movzx  edx,BYTE PTR [rdi]
   b:   84 d2                   test   dl,dl
   d:   74 24                   je     33 <foo+0x33>
   f:   49 89 e0                mov    r8,rsp
  12:   48 8d 4f 01             lea    rcx,[rdi+0x1]
  16:   48 89 e0                mov    rax,rsp
  19:   89 c3                   mov    ebx,eax
  1b:   44 28 c3                sub    bl,r8b
  1e:   89 de                   mov    esi,ebx
  20:   01 f2                   add    edx,esi
  22:   88 10                   mov    BYTE PTR [rax],dl
  24:   0f b6 11                movzx  edx,BYTE PTR [rcx]
  27:   48 83 c0 01             add    rax,0x1
  2b:   48 83 c1 01             add    rcx,0x1
  2f:   84 d2                   test   dl,dl
  31:   75 e6                   jne    19 <foo+0x19>
  33:   e8 00 00 00 00          call   38 <foo+0x38>
  38:   0f be 94 24 c8 00 00    movsx  edx,BYTE PTR [rsp+0xc8]
  3f:   00 
  40:   0f be 44 24 01          movsx  eax,BYTE PTR [rsp+0x1]
  45:   8d 04 02                lea    eax,[rdx+rax*1]
  48:   48 81 c4 00 01 00 00    add    rsp,0x100
  4f:   5b                      pop    rbx
  50:   c3                      ret    

(注意: TMP [200] 是在第一种情况下签署的8位位移量的范围内,而不是在这一个。)

(Note that tmp[200] was in range of a signed 8-bit displacement in the first case, but is not in this one.)

这篇关于为什么X86-64 GCC函数序言比局部变量分配较少的堆栈?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆