GCC将寄存器args放置在堆栈中,且其间隙小于局部变量? [英] GCC placing register args on the stack with a gap below local variables?

查看:84
本文介绍了GCC将寄存器args放置在堆栈中,且其间隙小于局部变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图查看一个非常简单的程序的汇编代码.

  int func(int x){int z = 1337;返回z;} 

使用GCC -O0,每个C变量都有一个未优化的内存地址,因此gcc溢出了其寄存器arg:(是否超出了必要?难道这不能触及本叶子函数无法触及的新缓存行吗?

解决方案

(首先,不要期望在 -O0 处做出有效的决定.事实证明,您在如果我们使用 volatile 或其他方法来强制编译器分配堆栈空间,则> -O0 仍会发生在 -O3 )

将函数参数x放置在局部变量下方的堆栈上的原因是什么?

选择是100%任意的,并且取决于编译器内部.GCC和clang都碰巧做出了这样的选择,但这基本上是无关紧要的.args到达寄存器,基本上 只是局部变量,因此完全由编译器决定将它们溢出到哪里(或者,如果启用优化则根本不溢出).

但是为什么要在真正需要的时间之后将其保存在堆栈中呢?

由于已知(?)GCC错过了优化错误,导致浪费了堆栈空间.例如,制作错误的asm .除非它们仍处于任何人关注的优化级别,特别是 -Os -O2 ,否则在 -O0 输出中查找错过的优化是没有意义的.或 -O3 .

我们可以用使用 volatile 的代码来证明这一点,该代码仍使gcc在 -O3 处为args/locals分配堆栈空间.曾经将其地址传递给另一个功能,但是GCC必须保留空间,而不仅仅是使用RSP下方的红色区域.

  int *易失性接收器;int func(int x,int y){下沉=& x;下沉=& y;int z = 1337;下沉=& z;返回z;} 

(浪费内存分配局部变量GCC的 main 在进入 main 时不假定16字节对齐.

  • 关于GCC为变量分配额外的堆栈空间的可能重复项,但大多数都是按照对齐的要求进行的,而不是多余的.

  • I tried to look at the assembly code for a very simple program.

    int func(int x) {
        int z = 1337;
        return z;
    } 
    

    With GCC -O0, every C variable has a memory address that's not optimized away, so gcc spills its register arg: (Godbolt, gcc5.5 -O0 -fverbose-asm)

    func:
            pushq   %rbp  #
            movq    %rsp, %rbp      #,
            movl    %edi, -20(%rbp) # x, x
            movl    $1337, -4(%rbp) #, z
            movl    -4(%rbp), %eax  # z, D.2332
            popq    %rbp    #
            ret
    

    What is the reason that the function parameter x gets placed on the stack below the local variables? Why not place it at at -4(%rbp) and the local below that?

    And when placing it below the local variables, why not place it at -8(%rbp)?

    Why leave a gap, using more of the than necessary? Couldn't this touch a new cache line that wouldn't otherwise have been touched in this leaf function?

    解决方案

    (First of all, don't expect efficient decisions at -O0. It turns out that the things you noticed at -O0 still happen at -O3 if we use volatile or other things to force the compiler to allocate stack space otherwise this question would be a lot less interesting.)

    What is the reason that the function parameter x gets placed on the stack below the local variables?

    The choice is 100% arbitrary, and depends on compiler internals. GCC and clang both happen to make that choice, but it's basically irrelevant. The args arrive in registers and basically are just locals so it's totally up to the compiler to decide where to spill them (or not spill at all, if you enable optimization).

    But why save it further down the stack later than really necessary?

    Because of known(?) GCC missed-optimization bugs leading to wasting stack space. For example, Why does GCC allocate more space than necessary on the stack? demonstrates x86-64 GCC -O3 allocating 24 instead of 8 bytes of stack space, where clang allocates 8. (I think I've seen a bug report about sometimes using an extra 16 bytes of space when GCC needs to move RSP (unlike here where it's just using the red zone) but can't find it on the GCC bugzilla.)

    Note that the x86-64 System V ABI mandates 16-byte stack alignment before call. After push %rbp and setting up RBP as a frame pointer, RBP and RSP are 16-byte aligned. -20(%rbp) is in the same aligned 16-byte chunk of stack space as -8(%rbp) so this gap isn't risking touching a new cache line or page that we wouldn't already have touched. (A naturally-aligned chunk of memory can't cross any boundary wider than itself, and x86-64 cache lines are always at least 32 bytes; these days always 64 bytes.)

    However, this does become a missed optimization if we add a 2nd arg, int y: gcc5.5 (and current gcc9.2 -O0) spills it to -24(%rbp) which could be in a new cache line.


    It turns out this missed optimization is not just because you used -O0 (compile fast, skip most optimization passes, make bad asm). Finding missed optimizations in -O0 output is meaningless unless they're still present at an optimization level anyone cares about, specifically -Os, -O2 or -O3.

    We can prove it with code that uses volatile to still make gcc allocate stack space for args/locals at -O3 Another option would have been to pass their address to another function, but then GCC would have to reserve space instead of just using the red-zone below RSP.

    int *volatile sink;
    
    int func(int x, int y) {
        sink = &x;
        sink = &y;
        int z = 1337;
        sink = &z;
        return z;
    }
    

    (Godbolt, gcc9.2)

    gcc9.2 -O3  (hand-edited comments)
    func(int, int):
            leaq    -20(%rsp), %rax                 # &x
            movq    %rax, sink(%rip)        # tmp84, sink
            leaq    -24(%rsp), %rax                 # &y
            movq    %rax, sink(%rip)        # tmp86, sink
            leaq    -4(%rsp), %rax                  # &z
            movq    %rax, sink(%rip)        # tmp88, sink
            movl    $1337, %eax     #,
            ret     
    sink:
            .zero   8
    

    Fun fact: clang -O3 spills the stack args before storing their address to sink, like it was a std::atomic release-store of the address and another thread could maybe load their value after getting the pointer from sink. But it doesn't do that for z. It's just a missed optimization to actually spill x and y and I can only speculate on what part of clang's internal machinery might be to blame.

    Anyway, clang does allocate z at -4(%rsp), x at -8, y at -12. So for whatever reason, clang also chooses to put the spill slots for the args below the locals.


    Related:

    • Waste in memory allocation for local variables discusses GCC's main not assuming 16-byte alignment on entry to main.

    • several possible duplicates about GCC allocating extra stack space for variables, but mostly just as required by alignment, not extra.

    这篇关于GCC将寄存器args放置在堆栈中,且其间隙小于局部变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆