从g ++输出中删除不必要的汇编器语句 [英] Remove needless assembler statements from g++ output

查看:81
本文介绍了从g ++输出中删除不必要的汇编器语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在调查本地二进制文件的一些问题.我注意到g ++创建了很多对我来说似乎不必要的ASM输出. -O0 :

I am investigating some problem with a local binary. I've noticed that g++ creates a lot of ASM output that seems unnecessary to me. Example with -O0:

Derived::Derived():
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp          <--- just need 8 bytes for the movq to -8(%rbp), why -16?
    movq    %rdi, -8(%rbp)
    movq    -8(%rbp), %rax
    movq    %rax, %rdi         <--- now we have moved rdi onto itself.
    call    Base::Base()
    leaq    16+vtable for Derived(%rip), %rdx
    movq    -8(%rbp), %rax     <--- effectively %edi, does not point into this area of the stack
    movq    %rdx, (%rax)       <--- thus this wont change -8(%rbp)
    movq    -8(%rbp), %rax     <--- so this statement is unnecessary
    movl    $4712, 12(%rax)
    nop
    leave
    ret

选项 -O1 -fno-inline -fno-elide-constructors -fno-omit-frame-pointer :

Derived::Derived():
    pushq   %rbp
    movq    %rsp, %rbp
    pushq   %rbx
    subq    $8, %rsp       <--- reserve some stack space and never use it.
    movq    %rdi, %rbx
    call    Base::Base()
    leaq    16+vtable for Derived(%rip), %rax
    movq    %rax, (%rbx)
    movl    $4712, 12(%rbx)
    addq    $8, %rsp       <--- release unused stack space.
    popq    %rbx
    popq    %rbp
    ret

此代码用于 Derived 的构造函数,该构造函数调用 Base 基本构造函数,然后覆盖位置0的vtable指针,并将常量值设置为其int成员除了 Base 包含的内容之外.

This code is for the constructor of Derived that calls the Base base constructor and then overrides the vtable pointer at position 0 and sets a constant value to an int member it holds in addition to what Base contains.

问题:

  • 我可以使用尽可能少的优化来翻译我的程序并摆脱这些东西吗?我必须设置哪些选项?还是有原因导致编译器无法使用 -O0 -O1 来检测这些情况,并且无法解决这些问题?
  • 为什么完全生成 subq $ 8,%rsp 语句?您不能优化没有任何开头意义的语句.为什么编译器会生成它?即使使用O0,寄存器分配算法也永远不会为不存在的内容生成代码.那么为什么要这样做呢?
  • Can I translate my program with as few optimizations as possible and get rid of such stuff? Which options would I have to set? Or is there a reason the compiler cannot detect these cases with -O0 or -O1 and there is no way around them?
  • Why is the subq $8, %rsp statement generated at all? You cannot optimize in or out a statement that makes no sense to begin with. Why does the compiler generate it then? The register allocation algorithm should never, even with O0, generate code for something that is not there. So why it is done?

推荐答案

我在您的 -O1 输出中看不到任何明显的优化遗漏.当然可以将RBP设置为帧指针,但是您使用了 -fno-omit-frame-pointer ,因此您清楚地知道了为什么GCC并没有对此进行优化.

I don't see any obvious missed optimizations in your -O1 output. Except of course setting up RBP as a frame pointer, but you used -fno-omit-frame-pointer so clearly you know why GCC didn't optimize that away.

该函数没有局部变量

The function has no local variables

您的函数是一个非静态的类成员函数,因此它具有一个隐式arg: rdi 中的 this .由于 -O0 ,哪个g ++溢出到堆栈中.函数args计为局部变量.

Your function is a non-static class member function, so it has one implicit arg: this in rdi. Which g++ spills to the stack because of -O0. Function args count as local variables.

如何在不产生任何影响的情况下进行循环移动来改善调试体验.请详细说明.

How does a cyclic move without an effect improve the debugging experience. Please elaborate.

要改善 C/C ++ 调试:debug-info格式只能描述C变量相对于RSP或RBP的位置,而不能描述它当前在哪个寄存器中.而且,您可以 modify带有调试器的任何变量,然后继续,即可获得预期的结果,就像您在C ++抽象机中所做的一样.每个语句都被编译为一个单独的asm块,寄存器中没有有效值(有趣的事实: register int foo 除外:该关键字确实会影响调试模式代码生成).

To improve C/C++ debugging: debug-info formats can only describe a C variable's location relative to RSP or RBP, not which register it's currently in. Also, so you can modify any variable with a debugger and continue, getting the expected results as if you'd done that in the C++ abstract machine. Every statement is compiled to a separate block of asm with no values alive in registers (Fun fact: except register int foo: that keyword does affect debug-mode code gen).

为什么会发出lang声会产生带有-O0的低效率asm(对于这个简单的浮点数总和)?也适用于G ++和其他编译器.

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? applies to G++ and other compilers as well.

我必须设置哪些选项?

Which options would I have to set?

如果您正在读取/调试asm,请至少使用 -Og 或更高版本来禁用<代码> -O0 .最好是 -O2 -O3 ,除非您希望看到比完全优化要多得多的错过的优化.但是 -Og -O1 将进行寄存器分配,并进行合理的循环(条件分支在底部),并进行各种简单的优化.尽管仍然不是异或归零的标准窥视孔.

If you're reading / debugging the asm, use at least -Og or higher to disable the debug-mode spill-everything-between-statements behaviour of -O0. Preferably -O2 or -O3 unless you like seeing even more missed optimizations than you'd get with full optimization. But -Og or -O1 will do register allocation and make sane loops (with the conditional branch at the bottom), and various simple optimizations. Although still not the standard peephole of xor-zeroing.

如何删除"; noise" 解释了如何编写使用args并返回值的函数,以便您可以编写不会进行优化的函数.

How to remove "noise" from GCC/clang assembly output? explains how to write functions that take args and return a value so you can write functions that don't optimize away.

先加载到RAX,然后再加载 movq%rax,%rdi 只是 -O0 的副作用.GCC花了很少的时间优化程序逻辑的GIMPLE和/或RTL内部表示形式(在发出x86 asm之前),甚至根本没有注意到它可能已经加载到RDI中. -O0 的部分要点是快速编译以及一致的调试.

Loading into RAX and then movq %rax, %rdi is just a side-effect of -O0. GCC spends so little time optimizing the GIMPLE and/or RTL internal representations of the program logic (before emitting x86 asm) that it doesn't even notice it could have loaded into RDI in the first place. Part of the point of -O0 is to compile quickly, as well as consistent debugging.

为什么完全生成 subq $ 8,%rsp 语句?

因为ABI在 call 指令之前需要16字节的堆栈对齐,并且此函数执行了偶数个8字节的 push es.( call 本身会推送一个返回地址).它会在没有 -fno-omit-frame-pointer 的情况下在 -O1 消失,因为您没有强迫g ++推送/弹出RBP以及保留调用的寄存器它确实需要.

Because the ABI requires 16-byte stack alignment before a call instruction, and this function did an even number of 8-byte pushes. (call itself pushes a return address). It will go away at -O1 without -fno-omit-frame-pointer because you aren't forcing g++ to push/pop RBP as well as the call-preserved register it actually needs.

为什么系统V/AMD64 ABI是否要求16字节堆栈对齐?

有趣的事实:clang通常会使用虚拟的 push%rcx / pop 之类的东西,具体取决于 -mtune 选项,而不是8字节子.

Fun fact: clang will often use a dummy push %rcx/pop or something, depending on -mtune options, instead of an 8-byte sub.

如果它是一个叶子函数,则g ++只会对本地人使用RSP下方的红色区域,即使在 -O0 处也是如此.为什么是没有"sub rsp"函数序言中的说明,为什么函数参数以负rbp偏移量存储?

If it were a leaf function, g++ would just use the red-zone below RSP for locals, even at -O0. Why is there no "sub rsp" instruction in this function prologue and why are function parameters stored at negative rbp offsets?

在未经优化的代码中,G ++分配了从未使用过的额外16字节并不罕见.即使有时启用了优化,g ++在争取16字节边界时也会将其堆栈分配大小向上舍入.这是一个未优化的错误.例如程序集中的内存分配和寻址

In un-optimized code it's not rare for G++ to allocate an extra 16 bytes it doesn't ever use. Even sometimes with optimization enabled g++ rounds up its stack allocation size too far when aiming for a 16-byte boundary. This is a missed-optimization bug. e.g. Memory allocation and addressing in Assembly

这篇关于从g ++输出中删除不必要的汇编器语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆