为什么这段代码比等效的C ++/Clang生成更多的汇编? [英] Why does this code generate much more assembly than equivalent C++/Clang?

查看：175 发布时间：2020/5/21 20:51:34 optimization rust llvm-codegen

本文介绍了为什么这段代码比等效的C ++/Clang生成更多的汇编?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了一个简单的C ++函数来检查编译器的优化情况:

I wrote a simple C++ function in order to check compiler optimization:

bool f1(bool a, bool b) {
    return !a || (a && b);
}

之后，我检查了Rust中的等效项:

After that I checked the equivalent in Rust:

fn f1(a: bool, b: bool) -> bool {
    !a || (a && b)
}

我使用 godbolt 来检查汇编程序的输出.

I used godbolt to check the assembler output.

C ++代码(由带有-O3标志的clang编译)的结果如下:

The result of the C++ code (compiled by clang with -O3 flag) is following:

f1(bool, bool):                                # @f1(bool, bool)
    xor     dil, 1
    or      dil, sil
    mov     eax, edi
    ret

Rust等效结果更长:

And the result of Rust equivalent is much longer:

example::f1:
  push rbp
  mov rbp, rsp
  mov al, sil
  mov cl, dil
  mov dl, cl
  xor dl, -1
  test dl, 1
  mov byte ptr [rbp - 3], al
  mov byte ptr [rbp - 4], cl
  jne .LBB0_1
  jmp .LBB0_3
.LBB0_1:
  mov byte ptr [rbp - 2], 1
  jmp .LBB0_4
.LBB0_2:
  mov byte ptr [rbp - 2], 0
  jmp .LBB0_4
.LBB0_3:
  mov al, byte ptr [rbp - 4]
  test al, 1
  jne .LBB0_7
  jmp .LBB0_6
.LBB0_4:
  mov al, byte ptr [rbp - 2]
  and al, 1
  movzx eax, al
  pop rbp
  ret
.LBB0_5:
  mov byte ptr [rbp - 1], 1
  jmp .LBB0_8
.LBB0_6:
  mov byte ptr [rbp - 1], 0
  jmp .LBB0_8
.LBB0_7:
  mov al, byte ptr [rbp - 3]
  test al, 1
  jne .LBB0_5
  jmp .LBB0_6
.LBB0_8:
  test byte ptr [rbp - 1], 1
  jne .LBB0_1
  jmp .LBB0_2

我也尝试了-O选项，但是输出为空(删除了未使用的功能).

I also tried with -O option but the output is empty (deleted unused function).

我故意不使用任何库来保持输出整洁.请注意，clang和rustc均使用LLVM作为后端.是什么解释了这种巨大的产出差异?如果仅是禁用优化开关问题，如何查看rustc的优化输出?

I intentionally am NOT using any library in order to keep output clean. Please notice that both clang and rustc use LLVM as a backend. What explains this huge output difference? And if it is only disabled-optimize-switch problem, how can I see optimized output from rustc?

推荐答案

使用编译器标志-O进行编译(

Compiling with the compiler flag -O (and with an added pub), I get this output (Link to Godbolt):

push    rbp
mov     rbp, rsp
xor     dil, 1
or      dil, sil
mov     eax, edi
pop     rbp
ret

几件事:

为什么它比C ++版本还要长?

Rust版本正好延长了三个指令:

The Rust version is exactly three instructions longer:

push    rbp
mov     rbp, rsp
[...]
pop     rbp

这些是管理所谓的帧指针或 base 指针(rbp)的指令.这主要是获得漂亮的堆栈跟踪所必需的.如果通过-fno-omit-frame-pointer将其禁用为C ++版本，则您将获得相同的结果.请注意，这使用g++而不是clang++，因为我找不到适用于clang编译器的类似选项.

These are instructions to manage the so called frame pointer or base pointer (rbp). This is mainly required to get nice stack traces. If you disable it for the C++ version via -fno-omit-frame-pointer, you get the same result. Note that this uses g++ instead of clang++ since I haven't found a comparable option for the clang compiler.

为什么Rust不省略帧指针?

实际上，是的.但是Godbolt向编译器添加了一个选项来保留帧指针.您可以在此处了解更多有关执行此操作的原因的信息.如果使用rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel"在本地编译代码，则会得到以下输出:

Actually, it does. But Godbolt adds an option to the compiler to preserve frame pointer. You can read more about why this is done here. If you compile your code locally with rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel", you get this output:

f1:
    xor dil, 1
    or  dil, sil
    mov eax, edi
    ret

确切地是您的C ++版本的输出.

Which is exactly the output of your C++ version.

通过将-C debuginfo=0传递给编译器，您可以撤消" Godbolt的功能.

You can "undo" what Godbolt does by passing -C debuginfo=0 to the compiler.

为什么用-O代替--release?

Why -O instead of --release?

Godbolt直接使用rustc而不是cargo. --release标志是cargo的标志.要在rustc上启用优化，您需要传递-O或-C opt-level=3(或0到3之间的任何其他级别).

Godbolt uses rustc directly instead of cargo. The --release flag is a flag for cargo. To enable optimizations on rustc, you need to pass -O or -C opt-level=3 (or any other level between 0 and 3).

这篇关于为什么这段代码比等效的C ++/Clang生成更多的汇编?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么这段代码比等效的C ++/Clang生成更多的汇编? [英] Why does this code generate much more assembly than equivalent C++/Clang?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么这段代码比等效的C ++/Clang生成更多的汇编? [英] Why does this code generate much more assembly than equivalent C++/Clang?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭