32位x86组件中堆栈对齐的责任 [英] Responsibility of stack alignment in 32-bit x86 assembly
问题描述
我试图清楚地了解谁(调用方或被调用方)负责堆栈对齐. 64位汇编的情况很清楚,它是由 caller 调用的.
I am trying to get a clear picture of who (caller or callee) is reponsible of stack alignment. The case for 64-bit assembly is rather clear, that it is by caller.
请参阅系统V AMD64 ABI,第3.2.2节堆栈框架:
Referring to System V AMD64 ABI, section 3.2.2 The Stack Frame:
输入自变量区域的末尾应对齐16(32,如果 __m256在堆栈上传递)字节边界.
The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary.
换句话说,对于被调用函数的每个入口点,应该安全:
In other words, it should be safe to assume, that for every entry point of called function:
16 | (%rsp + 8)
保持(额外的8个是因为call
隐式地将返回地址压入堆栈).
holds (extra eight is because call
implicitely pushes return address on stack).
在32位环境中(假设cdecl)它的外观如何?我注意到gcc
使用以下构造将对齐方式放入被调用函数内:
How it looks in 32-bit world (assuming cdecl)? I noticed that gcc
places the alignment inside the called function with following construct:
and esp, -16
这似乎表明,这是被告人的责任.
which seems to indicate, that is callee's responsibility.
更清楚地说,请考虑以下NASM代码:
To put it clearer, consider following NASM code:
global main
extern printf
extern scanf
section .rodata
s_fmt db "%d %d", 0
s_res db `%d with remainder %d\n`, 0
section .text
main:
start 0, 0
sub esp, 8
mov DWORD [ebp-4], 0 ; dividend
mov DWORD [ebp-8], 0 ; divisor
lea eax, [ebp-8]
push eax
lea eax, [ebp-4]
push eax
push s_fmt
call scanf
add esp, 12
mov eax, [ebp-4]
cdq
idiv DWORD [ebp-8]
push edx
push eax
push s_res
call printf
xor eax, eax
leave
ret
在调用scanf
之前是否需要对齐堆栈?如果是这样,则在将这两个参数推为scanf
之前,需要将%esp
减少四个字节:
Is it required to align the stack before scanf
is called? If so, then this would require to decrease %esp
by four bytes before pushing these two arguments to scanf
as:
4 bytes (return address)
4 bytes (%ebp of previous stack frame)
8 bytes (for two variables)
12 bytes (three arguments for scanf)
= 28
推荐答案
仅GCC 在main
中进行了额外的堆栈对齐;该功能很特殊.,如果您查看其他任何功能的代码源,除非您拥有带有alignas(32)
或类似内容的本地语言,否则您将看不到它.
GCC only does this extra stack alignment in main
; that function is special. You won't see it if you look at code-gen for any other function, unless you have a local with alignas(32)
or something.
GCC只是对-m32
采取了防御性的方法,不假定main
是通过正确的16B对齐的堆栈调用的.或者,当-mpreferred-stack-boundary=4
只是一个好主意,而不是法律时,就留下了这种特殊待遇.
GCC is just taking a defensive approach with -m32
, by not assuming that main
is called with a properly 16B-aligned stack. Or this special treatment is left over from when -mpreferred-stack-boundary=4
was only a good idea, not the law.
i386 System V ABI多年来一直保证/要求ESP + 4在功能上进行16B对齐. (即ESP必须在CALL指令之前 对齐16B,因此堆栈上的args从16B边界开始.这与x86-64 System V相同.)
The i386 System V ABI has guaranteed/required for years that ESP+4 is 16B-aligned on entry to a function. (i.e. ESP must be 16B-aligned before a CALL instruction, so args on the stack start at a 16B boundary. This is the same as for x86-64 System V.)
ABI还保证新的32位进程以在16B边界上对齐的ESP开始(例如,在_start
,ELF入口点,ESP指向argc,而不是返回地址),以及glibc CRT代码保持一致.
The ABI also guarantees that new 32-bit processes start with ESP aligned on a 16B boundary (e.g. at _start
, the ELF entry point, where ESP points at argc, not a return address), and the glibc CRT code maintains that alignment.
就调用约定而言,EBP只是另一个保留呼叫的寄存器.但是,是的,使用-fno-omit-frame-pointer
的编译器输出确实会在其他调用保留寄存器(例如EBX)之前处理push ebp
,因此保存的EBP值形成了一个链表. (因为它也执行mov ebp, esp
部分,即在该推送之后设置帧指针.)
As far as the calling convention is concerned, EBP is just another call-preserved register. But yes, compiler output with -fno-omit-frame-pointer
does take care to push ebp
before other call-preserved registers (like EBX) so the saved EBP values form a linked list. (Because it also does the mov ebp, esp
part of setting up a frame pointer after that push.)
gcc也许是防御性的,因为一个非常古老的Linux内核(从该版本升级到i386 ABI之前,当时所需的对齐方式仅为4B)可能违反了这一假设,而且这只是一条额外的指令,在生命周期中只能运行一次-处理时间(假设程序没有递归调用main
).
Perhaps gcc is defensive because an extremely ancient Linux kernel (from before that revision to the i386 ABI, when the required alignment was only 4B) could violate that assumption, and it's only an extra couple instructions that run once in the life-time of the process (assuming the program doesn't call main
recursively).
与gcc不同,clang假定堆栈在进入main时已正确对齐. (clang还在
Unlike gcc, clang assumes the stack is properly aligned on entry to main. (clang also assumes that narrow args have been sign or zero-extended to 32 bits, even though the current ABI revision doesn't specify that behaviour (yet). gcc and clang both emit code that does in the caller side, but only clang depends on it in the callee. This happens in 64-bit code, but I didn't check 32-bit.)
在 http://gcc.godbolt.org/上查看编译器输出以了解其他主要功能比主要的要好奇.
Look at compiler output on http://gcc.godbolt.org/ for main and functions other than main if you're curious.
我刚刚更新了在前一天标记Wiki. http://x86-64.org/仍然死了,似乎还没有回来,所以我更新了System V链接,以指向HJ Lu的github存储库中当前修订版的PDF,并且
I just updated the ABI links in the x86 tag wiki the other day. http://x86-64.org/ is still dead and seems to be not coming back, so I updated the System V links to point to the PDFs of the current revision in HJ Lu's github repo, and his page with links.
请注意,SCO网站上的最新版本不是 当前版本,并且不包括16B-stack-alignment要求.
Note that the last version on SCO's site is not the current revision, and doesn't include the 16B-stack-alignment requirement.
我认为某些BSD版本仍然不需要/保持16字节堆栈对齐.
I think some BSD versions still don't require / maintain 16-byte stack alignment.
这篇关于32位x86组件中堆栈对齐的责任的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!