为什么X86-64 GCC函数序言比局部变量分配较少的堆栈? [英] Why does the x86-64 GCC function prologue allocate less stack than the local variables?
问题描述
考虑下面的简单程序:
int main(int argc, char **argv)
{
char buffer[256];
buffer[0] = 0x41;
buffer[128] = 0x41;
buffer[255] = 0x41;
return 0;
}
一x86-64的机器上使用GCC编译4.7.0。主要拆卸()与GDB给出了:
Compiled with GCC 4.7.0 on a x86-64 machine. Disassembly of main() with GDB gives:
0x00000000004004cc <+0>: push rbp
0x00000000004004cd <+1>: mov rbp,rsp
0x00000000004004d0 <+4>: sub rsp,0x98
0x00000000004004d7 <+11>: mov DWORD PTR [rbp-0x104],edi
0x00000000004004dd <+17>: mov QWORD PTR [rbp-0x110],rsi
0x00000000004004e4 <+24>: mov BYTE PTR [rbp-0x100],0x41
0x00000000004004eb <+31>: mov BYTE PTR [rbp-0x80],0x41
0x00000000004004ef <+35>: mov BYTE PTR [rbp-0x1],0x41
0x00000000004004f3 <+39>: mov eax,0x0
0x00000000004004f8 <+44>: leave
0x00000000004004f9 <+45>: ret
为什么它的子RSP只= 0x98在全局时152D缓冲区256字节?当我MOV数据到buffer [0]它只是似乎使用的数据分配的堆栈帧的外面,用RBP引用,那么什么是子RSP的连点,0x98在全局?
Why does it sub rsp with only 0x98 = 152d when the buffer is 256 byte? When I mov data into buffer[0] it simply seems to use data outside of the allocated stack frame and use rbp to reference, so what is even the point of the sub rsp,0x98?
另外一个问题,做这行呢?
Another question, what do these lines do?
0x00000000004004d7 <+11>: mov DWORD PTR [rbp-0x104],edi
0x00000000004004dd <+17>: mov QWORD PTR [rbp-0x110],rsi
为什么EDI,而不是RDI需要保存?我看不过是它的动作在C code此分配的缓冲区的最大范围之外。同样有趣的是,为什么这两个变量之间的变化是如此之大。由于EDI就是4个字节,为什么它需要两个变量一个12字节的分离?
Why does EDI and not RDI need to be saved? I see that it moves this outside of the maximum range of the allocated buffer in the C code however. Also of interest is why the delta between the two variables is so big. Since EDI is just 4 bytes, why does it need a 12 byte separation for the two variables?
推荐答案
The x86-64 ABI used by Linux (and some other OSes, although notably not Windows, which has its own different ABI) defines a "red zone" of 128 bytes below the stack pointer, which is guaranteed not to be touched by signal or interrupt handlers. (See figure 3.3 and §3.2.2.)
叶函数(即,一个不叫别的),因此可能使用该区域为所欲为 - 这是不是做像调用任何
这将放置在堆栈指针数据;任何信号或中断处理程序将按照ABI和存储任何东西之前通过至少增加128个字节下降堆栈指针。
A leaf function (i.e. one which does not call anything else) may therefore use this area for whatever it wants - it isn't doing anything like a call
which would place data at the stack pointer; and any signal or interrupt handler will follow the ABI and drop the stack pointer by at least an additional 128 bytes before storing anything.
(较短的指令编码可用于符号的8位的位移,所以红色区域的一点是,它增加了叶函数可以使用这些较短指令的本地的数据量。)
(Shorter instruction encodings are available for signed 8-bit displacements, so the point of the red zone is that it increases the amount of local data that a leaf function can access using these shorter instructions.)
这就是这里发生了什么。
That's what's happening here.
不过......这code未利用这些较短的编码(它使用从偏移量 RBP
而不是 RSP
)。为什么不?它也节省了 EDI
和 RSI
完全不必要的 - 你问为什么它节省了 EDI
而不是 RDI
,但为什么它保存它呢?
But... this code isn't making use of those shorter encodings (it's using offsets from rbp
rather than rsp
). Why not? It's also saving edi
and rsi
completely unnecessarily - you ask why it's saving edi
instead of rdi
, but why is it saving it at all?
答案是,编译器生成真的很糟糕code,因为没有任何的优化已启用。如果启用任何优化,你的整个功能是可能向下崩:
The answer is that the compiler is generating really crummy code, because no optimisations are enabled. If you enable any optimisation, your entire function is likely to collapse down to:
mov eax, 0
ret
因为这是真正的所有需要做的:缓冲[]
是本地的,所以它所做的更改将永远不会为任何其他可见的,所以可以被优化掉;除此之外,所有的功能,需要做的是返回0。
because that's really all it needs to do: buffer[]
is local, so the changes made to it will never be visible to anything else, so can be optimised away; beyond that, all the function needs to do is return 0.
所以,这里有一个更好的例子。这个功能完全是胡说八道,而是利用一个类似的数组,而做得不够,以确保事情不都得到优化:
So, here's a better example. This function is complete nonsense, but makes use of a similar array, whilst doing enough to ensure that things don't all get optimised away:
$ cat test.c
int foo(char *bar)
{
char tmp[256];
int i;
for (i = 0; bar[i] != 0; i++)
tmp[i] = bar[i] + i;
return tmp[1] + tmp[200];
}
有一些优化编译,你可以看到类似用途的红色区域,
只是这次它确实使用补偿从 RSP
:
$ gcc -m64 -O1 -c test.c
$ objdump -Mintel -d test.o
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 53 push rbx
1: 48 81 ec 88 00 00 00 sub rsp,0x88
8: 0f b6 17 movzx edx,BYTE PTR [rdi]
b: 84 d2 test dl,dl
d: 74 26 je 35 <foo+0x35>
f: 4c 8d 44 24 88 lea r8,[rsp-0x78]
14: 48 8d 4f 01 lea rcx,[rdi+0x1]
18: 4c 89 c0 mov rax,r8
1b: 89 c3 mov ebx,eax
1d: 44 28 c3 sub bl,r8b
20: 89 de mov esi,ebx
22: 01 f2 add edx,esi
24: 88 10 mov BYTE PTR [rax],dl
26: 0f b6 11 movzx edx,BYTE PTR [rcx]
29: 48 83 c0 01 add rax,0x1
2d: 48 83 c1 01 add rcx,0x1
31: 84 d2 test dl,dl
33: 75 e6 jne 1b <foo+0x1b>
35: 0f be 54 24 50 movsx edx,BYTE PTR [rsp+0x50]
3a: 0f be 44 24 89 movsx eax,BYTE PTR [rsp-0x77]
3f: 8d 04 02 lea eax,[rdx+rax*1]
42: 48 81 c4 88 00 00 00 add rsp,0x88
49: 5b pop rbx
4a: c3 ret
现在让我们来调整它非常轻微,通过插入调用另一个函数,
让富()
不再叶函数:
$ cat test.c
extern void dummy(void); /* ADDED */
int foo(char *bar)
{
char tmp[256];
int i;
for (i = 0; bar[i] != 0; i++)
tmp[i] = bar[i] + i;
dummy(); /* ADDED */
return tmp[1] + tmp[200];
}
现在的红色区域不能使用,所以你看到的东西更喜欢你
原本预计:
Now the red zone cannot be used, so you see something more like you originally expected:
$ gcc -m64 -O1 -c test.c
$ objdump -Mintel -d test.o
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 53 push rbx
1: 48 81 ec 00 01 00 00 sub rsp,0x100
8: 0f b6 17 movzx edx,BYTE PTR [rdi]
b: 84 d2 test dl,dl
d: 74 24 je 33 <foo+0x33>
f: 49 89 e0 mov r8,rsp
12: 48 8d 4f 01 lea rcx,[rdi+0x1]
16: 48 89 e0 mov rax,rsp
19: 89 c3 mov ebx,eax
1b: 44 28 c3 sub bl,r8b
1e: 89 de mov esi,ebx
20: 01 f2 add edx,esi
22: 88 10 mov BYTE PTR [rax],dl
24: 0f b6 11 movzx edx,BYTE PTR [rcx]
27: 48 83 c0 01 add rax,0x1
2b: 48 83 c1 01 add rcx,0x1
2f: 84 d2 test dl,dl
31: 75 e6 jne 19 <foo+0x19>
33: e8 00 00 00 00 call 38 <foo+0x38>
38: 0f be 94 24 c8 00 00 movsx edx,BYTE PTR [rsp+0xc8]
3f: 00
40: 0f be 44 24 01 movsx eax,BYTE PTR [rsp+0x1]
45: 8d 04 02 lea eax,[rdx+rax*1]
48: 48 81 c4 00 01 00 00 add rsp,0x100
4f: 5b pop rbx
50: c3 ret
(注意: TMP [200]
是在第一种情况下签署的8位位移量的范围内,而不是在这一个。)
(Note that tmp[200]
was in range of a signed 8-bit displacement in the first case, but is not in this one.)
这篇关于为什么X86-64 GCC函数序言比局部变量分配较少的堆栈?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!