是 <比 <=? [英] Is < faster than <=?
问题描述
if (a <901)
是否比 if (a <= 900)
快?
与这个简单示例中的不完全相同,但循环复杂代码的性能略有变化.我想这必须对生成的机器代码做一些事情,以防万一.
不,在大多数架构上它不会更快.您没有指定,但在 x86 上,所有积分比较通常都将在两条机器指令中实现:
- 一条
test
或cmp
指令,用于设置EFLAGS
- 还有一个
Jcc
(跳转)指令, 取决于比较类型(和代码布局): jne
- 如果不相等则跳转 -->ZF = 0
jz
- 如果为零(等于)则跳转 -->ZF = 1
jg
- 如果更大则跳转 -->ZF = 0 和 SF = OF
- (等等...)
示例(为简洁起见编辑)使用 $ gcc -m32 -S -masm=intel test.c
if (a < b) {//做某事 1}
编译为:
mov eax, DWORD PTR [esp+24] ;一种cmp eax, DWORD PTR [esp+28] ;乙jge .L2 ;如果 a 是 >= b 则跳转;做点什么 1.L2:
和
if (a <= b) {//做一些事情 2}
编译为:
mov eax, DWORD PTR [esp+24] ;一种cmp eax, DWORD PTR [esp+28] ;乙jg .L5 ;如果 a 是 > 则跳转乙;做点什么 2.L5:
所以两者之间的唯一区别是 jg
与 jge
指令.两者将花费相同的时间.
我想解决以下评论,即没有任何内容表明不同的跳转指令需要相同的时间.这个回答有点棘手,但这是我可以给出的:在 Intel 指令集参考,它们都组合在一个共同指令下,Jcc
(满足条件跳转).在 优化参考手册,在附录 C. 延迟和吞吐量中.
延迟 - 所需的时钟周期数执行核心完成所有形成的μops的执行一条指令.
<块引用>
吞吐量 - 所需的时钟周期数在发出端口可以自由接受相同指令之前等待再次.对于许多指令,一条指令的吞吐量可以是明显低于其延迟
Jcc
的值是:
延迟吞吐量Jcc 不适用 0.5
在 Jcc
上有以下脚注:
- 条件跳转指令的选择应基于第 3.4.1 节分支预测优化"的建议,以提高分支的可预测性.当分支预测成功时,
jcc
的延迟实际上为零.
因此,英特尔文档中的任何内容都没有将一个 Jcc
指令与其他指令区别对待.
如果考虑用于实现指令的实际电路,人们可以假设在 EFLAGS
中的不同位上会有简单的 AND/OR 门,以确定是否满足条件.那么,没有理由测试两个位的指令比测试一个位的指令花费更多或更少的时间(忽略门传播延迟,它远小于时钟周期.)
浮点
这也适用于 x87 浮点数:(与上面的代码几乎相同,但使用 double
而不是 int
.)
fld QWORD PTR [esp+32]fld QWORD PTR [esp+40]fucomip st, st(1) ;比较 ST(0) 和 ST(1),并在 EFLAGS 中设置 CF、PF、ZFfstp st(0)刚毛;如果高于(CF=0 和 ZF=0),则设置 al.测试 al, alje.L2;做点什么 1.L2:fld QWORD PTR [esp+32]fld QWORD PTR [esp+40]fucomip st, st(1) ;(和上面一样)fstp st(0)刚毛 ;如果大于或等于 (CF=0),则设置 al.测试 al, alje .L5;做点什么 2.L5:离开退
Is if (a < 901)
faster than if (a <= 900)
?
Not exactly as in this simple example, but there are slight performance changes on loop complex code. I suppose this has to do something with generated machine code in case it's even true.
No, it will not be faster on most architectures. You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions:
- A
test
orcmp
instruction, which setsEFLAGS
- And a
Jcc
(jump) instruction, depending on the comparison type (and code layout): jne
- Jump if not equal -->ZF = 0
jz
- Jump if zero (equal) -->ZF = 1
jg
- Jump if greater -->ZF = 0 and SF = OF
- (etc...)
Example (Edited for brevity) Compiled with $ gcc -m32 -S -masm=intel test.c
if (a < b) {
// Do something 1
}
Compiles to:
mov eax, DWORD PTR [esp+24] ; a
cmp eax, DWORD PTR [esp+28] ; b
jge .L2 ; jump if a is >= b
; Do something 1
.L2:
And
if (a <= b) {
// Do something 2
}
Compiles to:
mov eax, DWORD PTR [esp+24] ; a
cmp eax, DWORD PTR [esp+28] ; b
jg .L5 ; jump if a is > b
; Do something 2
.L5:
So the only difference between the two is a jg
versus a jge
instruction. The two will take the same amount of time.
I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time. This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference, they are all grouped together under one common instruction, Jcc
(Jump if condition is met). The same grouping is made together under the Optimization Reference Manual, in Appendix C. Latency and Throughput.
Latency — The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction.
Throughput — The number of clock cycles required to wait before the issue ports are free to accept the same instruction again. For many instructions, the throughput of an instruction can be significantly less than its latency
The values for Jcc
are:
Latency Throughput
Jcc N/A 0.5
with the following footnote on Jcc
:
- Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, "Branch Prediction Optimization," to improve the predictability of branches. When branches are predicted successfully, the latency of
jcc
is effectively zero.
So, nothing in the Intel docs ever treats one Jcc
instruction any differently from the others.
If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in EFLAGS
, to determine whether the conditions are met. There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.)
Edit: Floating Point
This holds true for x87 floating point as well: (Pretty much same code as above, but with double
instead of int
.)
fld QWORD PTR [esp+32]
fld QWORD PTR [esp+40]
fucomip st, st(1) ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS
fstp st(0)
seta al ; Set al if above (CF=0 and ZF=0).
test al, al
je .L2
; Do something 1
.L2:
fld QWORD PTR [esp+32]
fld QWORD PTR [esp+40]
fucomip st, st(1) ; (same thing as above)
fstp st(0)
setae al ; Set al if above or equal (CF=0).
test al, al
je .L5
; Do something 2
.L5:
leave
ret
这篇关于是 <比 <=?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!