为什么这段代码比等效的C ++/Clang生成更多的汇编? [英] Why does this code generate much more assembly than equivalent C++/Clang?
问题描述
我写了一个简单的C ++函数来检查编译器的优化情况:
I wrote a simple C++ function in order to check compiler optimization:
bool f1(bool a, bool b) {
return !a || (a && b);
}
之后,我检查了Rust中的等效项:
After that I checked the equivalent in Rust:
fn f1(a: bool, b: bool) -> bool {
!a || (a && b)
}
我使用 godbolt 来检查汇编程序的输出.
I used godbolt to check the assembler output.
C ++代码(由带有-O3标志的clang编译)的结果如下:
The result of the C++ code (compiled by clang with -O3 flag) is following:
f1(bool, bool): # @f1(bool, bool)
xor dil, 1
or dil, sil
mov eax, edi
ret
Rust等效结果更长:
And the result of Rust equivalent is much longer:
example::f1:
push rbp
mov rbp, rsp
mov al, sil
mov cl, dil
mov dl, cl
xor dl, -1
test dl, 1
mov byte ptr [rbp - 3], al
mov byte ptr [rbp - 4], cl
jne .LBB0_1
jmp .LBB0_3
.LBB0_1:
mov byte ptr [rbp - 2], 1
jmp .LBB0_4
.LBB0_2:
mov byte ptr [rbp - 2], 0
jmp .LBB0_4
.LBB0_3:
mov al, byte ptr [rbp - 4]
test al, 1
jne .LBB0_7
jmp .LBB0_6
.LBB0_4:
mov al, byte ptr [rbp - 2]
and al, 1
movzx eax, al
pop rbp
ret
.LBB0_5:
mov byte ptr [rbp - 1], 1
jmp .LBB0_8
.LBB0_6:
mov byte ptr [rbp - 1], 0
jmp .LBB0_8
.LBB0_7:
mov al, byte ptr [rbp - 3]
test al, 1
jne .LBB0_5
jmp .LBB0_6
.LBB0_8:
test byte ptr [rbp - 1], 1
jne .LBB0_1
jmp .LBB0_2
我也尝试了-O
选项,但是输出为空(删除了未使用的功能).
I also tried with -O
option but the output is empty (deleted unused function).
我故意不使用任何库来保持输出整洁.请注意,clang
和rustc
均使用LLVM作为后端.是什么解释了这种巨大的产出差异?如果仅是禁用优化开关问题,如何查看rustc
的优化输出?
I intentionally am NOT using any library in order to keep output clean. Please notice that both clang
and rustc
use LLVM as a backend. What explains this huge output difference? And if it is only disabled-optimize-switch problem, how can I see optimized output from rustc
?
推荐答案
使用编译器标志-O
进行编译(链接到Godbolt ):
Compiling with the compiler flag -O
(and with an added pub
), I get this output (Link to Godbolt):
push rbp
mov rbp, rsp
xor dil, 1
or dil, sil
mov eax, edi
pop rbp
ret
几件事:
-
为什么它比C ++版本还要长?
Rust版本正好延长了三个指令:
The Rust version is exactly three instructions longer:
push rbp
mov rbp, rsp
[...]
pop rbp
这些是管理所谓的帧指针或 base 指针(rbp
)的指令.这主要是获得漂亮的堆栈跟踪所必需的.如果通过-fno-omit-frame-pointer
将其禁用为C ++版本,则您将获得相同的结果.请注意,这使用g++
而不是clang++
,因为我找不到适用于clang编译器的类似选项.
These are instructions to manage the so called frame pointer or base pointer (rbp
). This is mainly required to get nice stack traces. If you disable it for the C++ version via -fno-omit-frame-pointer
, you get the same result. Note that this uses g++
instead of clang++
since I haven't found a comparable option for the clang compiler.
为什么Rust不省略帧指针?
实际上,是的.但是Godbolt向编译器添加了一个选项来保留帧指针.您可以在此处了解更多有关执行此操作的原因的信息.如果使用rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel"
在本地编译代码,则会得到以下输出:
Actually, it does. But Godbolt adds an option to the compiler to preserve frame pointer. You can read more about why this is done here. If you compile your code locally with rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel"
, you get this output:
f1:
xor dil, 1
or dil, sil
mov eax, edi
ret
确切地是您的C ++版本的输出.
Which is exactly the output of your C++ version.
通过将-C debuginfo=0
传递给编译器,您可以撤消" Godbolt的功能.
You can "undo" what Godbolt does by passing -C debuginfo=0
to the compiler.
为什么用-O
代替--release
?
Why -O
instead of --release
?
Godbolt直接使用rustc
而不是cargo
. --release
标志是cargo
的标志.要在rustc
上启用优化,您需要传递-O
或-C opt-level=3
(或0到3之间的任何其他级别).
Godbolt uses rustc
directly instead of cargo
. The --release
flag is a flag for cargo
. To enable optimizations on rustc
, you need to pass -O
or -C opt-level=3
(or any other level between 0 and 3).
这篇关于为什么这段代码比等效的C ++/Clang生成更多的汇编?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!