32 位 x86 汇编中堆栈对齐的职责 [英] Responsibility of stack alignment in 32-bit x86 assembly

查看:30
本文介绍了32 位 x86 汇编中堆栈对齐的职责的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图清楚地了解谁(调用者或被调用者)负责堆栈对齐.64 位汇编的情况相当清楚,它是由调用者进行的.

I am trying to get a clear picture of who (caller or callee) is reponsible of stack alignment. The case for 64-bit assembly is rather clear, that it is by caller.

参考 System V AMD64 ABI,第 3.2.2 节 堆栈框架:

Referring to System V AMD64 ABI, section 3.2.2 The Stack Frame:

输入参数区域的末尾应对齐在 16(32,如果__m256 在堆栈上传递)字节边界.

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary.

换句话说,应该可以安全地假设,对于被调用函数的每个入口点:

In other words, it should be safe to assume, that for every entry point of called function:

<代码>16 |(%rsp + 8)

holds(额外的 8 个是因为 call 隐式地将返回地址压入堆栈).

holds (extra eight is because call implicitely pushes return address on stack).

它在 32 位世界中的样子(假设是 cdecl)?我注意到 gcc 将对齐放在 inside 被调用的函数中,其结构如下:

How it looks in 32-bit world (assuming cdecl)? I noticed that gcc places the alignment inside the called function with following construct:

and esp, -16

这似乎表明,这是被调用者的责任.

which seems to indicate, that is callee's responsibility.

为了更清楚,请考虑以下 NASM 代码:

To put it clearer, consider following NASM code:

global main
extern printf
extern scanf
section .rodata
    s_fmt   db "%d %d", 0
    s_res   db `%d with remainder %d
`, 0
section .text
main:
    start   0, 0
    sub     esp, 8
    mov     DWORD [ebp-4], 0 ; dividend
    mov     DWORD [ebp-8], 0 ; divisor

    lea     eax, [ebp-8]
    push    eax
    lea     eax, [ebp-4]
    push    eax
    push    s_fmt
    call    scanf
    add     esp, 12

    mov     eax, [ebp-4]
    cdq
    idiv    DWORD [ebp-8]

    push    edx
    push    eax
    push    s_res
    call    printf

    xor     eax, eax
    leave
    ret

在调用scanf之前是否需要对齐堆栈?如果是这样,那么这将需要在将这两个参数推送到 scanf 之前将 %esp 减少四个字节:

Is it required to align the stack before scanf is called? If so, then this would require to decrease %esp by four bytes before pushing these two arguments to scanf as:

4 bytes (return address)
4 bytes (%ebp of previous stack frame)
8 bytes (for two variables)
12 bytes (three arguments for scanf)
= 28

推荐答案

GCC onlymain 中做这个额外的堆栈对齐;该函数很特别.如果您查看任何其他函数的代码生成,您将不会看到它,除非您有一个带有 alignas(32) 或其他东西的本地函数.

GCC only does this extra stack alignment in main; that function is special. You won't see it if you look at code-gen for any other function, unless you have a local with alignas(32) or something.

GCC 只是对 -m32 采取防御性方法,不假设 main 是使用正确的 16B 对齐堆栈调用的.或者这种特殊待遇是在 -mpreferred-stack-boundary=4 只是一个好主意而不是法律时遗留下来的.

GCC is just taking a defensive approach with -m32, by not assuming that main is called with a properly 16B-aligned stack. Or this special treatment is left over from when -mpreferred-stack-boundary=4 was only a good idea, not the law.

多年来,i386 System V ABI 一直保证/要求 ESP+4 在进入函数时是 16B 对齐的.(即 ESP 必须在 CALL 指令之前以 16B 对齐,因此堆栈上的参数从 16B 边界开始.这与 x86-64 System V 相同.)

The i386 System V ABI has guaranteed/required for years that ESP+4 is 16B-aligned on entry to a function. (i.e. ESP must be 16B-aligned before a CALL instruction, so args on the stack start at a 16B boundary. This is the same as for x86-64 System V.)

ABI 还保证新的 32 位进程以 ESP 对齐在 16B 边界上开始(例如在 _start,ELF 入口点,其中 ESP 指向 argc,而不是返回地址),并且 glibc CRT 代码保持这种对齐.

The ABI also guarantees that new 32-bit processes start with ESP aligned on a 16B boundary (e.g. at _start, the ELF entry point, where ESP points at argc, not a return address), and the glibc CRT code maintains that alignment.

就调用约定而言,EBP 只是另一个调用保留寄存器.但是,是的,带有 -fno-omit-frame-pointer 的编译器输出确实会在其他调用保留寄存器(如 EBX)之前注意 push ebp,因此保存的 EBP 值形成一个链表.(因为它也做了 mov ebp, esp 在推送之后设置帧指针的部分.)

As far as the calling convention is concerned, EBP is just another call-preserved register. But yes, compiler output with -fno-omit-frame-pointer does take care to push ebp before other call-preserved registers (like EBX) so the saved EBP values form a linked list. (Because it also does the mov ebp, esp part of setting up a frame pointer after that push.)

也许 gcc 是防御性的,因为一个非常古老的 Linux 内核(从对 i386 ABI 的那个修订版之前,当时所需的对齐只有 4B)可能会违反这个假设,而且它只是在生命周期中运行一次的额外指令-进程的时间(假设程序不递归调用main).

Perhaps gcc is defensive because an extremely ancient Linux kernel (from before that revision to the i386 ABI, when the required alignment was only 4B) could violate that assumption, and it's only an extra couple instructions that run once in the life-time of the process (assuming the program doesn't call main recursively).

与 gcc 不同,clang 假设堆栈在进入 main 时正确对齐.(clang 也 假设窄参数已被符号或零扩展到 32 位,即使当前的 ABI 修订版并未指定该行为(尚未).gcc 和 clang 都发出了这样的代码在调用方,但在被调用方中只有 clang 依赖于它.这发生在 64 位代码中,但我没有检查 32 位.)

Unlike gcc, clang assumes the stack is properly aligned on entry to main. (clang also assumes that narrow args have been sign or zero-extended to 32 bits, even though the current ABI revision doesn't specify that behaviour (yet). gcc and clang both emit code that does in the caller side, but only clang depends on it in the callee. This happens in 64-bit code, but I didn't check 32-bit.)

查看 http://gcc.godbolt.org/ 上的编译器输出,了解主要和其他功能如果你好奇,主要.

Look at compiler output on http://gcc.godbolt.org/ for main and functions other than main if you're curious.

我刚刚更新了 前几天标记维基.http://x86-64.org/ 还是死了,好像不会回来了,所以我更新了System V 链接指向 HJ Lu 的 github repo 中当前修订版的 PDF,以及 他的带有链接的页面.

I just updated the ABI links in the x86 tag wiki the other day. http://x86-64.org/ is still dead and seems to be not coming back, so I updated the System V links to point to the PDFs of the current revision in HJ Lu's github repo, and his page with links.

请注意,SCO 网站上的最新版本不是 当前版本,不包括 16B 堆栈对齐要求.

Note that the last version on SCO's site is not the current revision, and doesn't include the 16B-stack-alignment requirement.

我认为某些 BSD 版本仍然不需要/保持 16 字节堆栈对齐.

I think some BSD versions still don't require / maintain 16-byte stack alignment.

这篇关于32 位 x86 汇编中堆栈对齐的职责的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆