为什么带有两个常量的三元运算符比带有变量的三进制运算符快? [英] Why is a ternary operator with two constants faster than one with a variable?
问题描述
在Java中,我有两个不同的语句,它们通过使用三元运算符来实现相同的结果,如下所示:
In Java, I have two different statements which accomplish the same result through using ternary operators, which are as follows:
-
num< 0? 0:num;
-
num *(num< 0?0:1);
num < 0 ? 0 : num;
num * (num < 0 ? 0 : 1);
似乎第二个语句不必要地复杂,并且会花费比第一个更长的时间,但是当我记录每个语句所花费的时间时,使用下面的代码,结果如下:
It appears that the second statement is unnecessarily complex and would take longer than the first, however when I recorded the time that each took, using the following code, the results were as follows:
final long startTime = System.currentTimeMillis();
Random rand = new Random();
float[] results = new float[100000000];
for (int i = 0; i < 100000000; i++) {
float num = (rand.nextFloat() * 2) - 1;
results[i] = num < 0 ? 0 : num;
//results[i] = num * (num < 0 ? 0 : 1);
}
final long endTime = System.currentTimeMillis();
System.out.println("Total Time: " + (endTime - startTime));
- 1.232秒
- 1.023秒
(每次平均运行5次以上)
为什么使用第二条语句时会有这么大的加速?它似乎包括不必要的乘法,并且具有相同的比较。第一个创建分支,第二个没有吗?
Why is there this significant speedup when using the second statement? It seems to include an unnecessary multiplication and have the same comparison. Does the first create a branch whilst the second does not?
推荐答案
First, let's rewrite the benchmark with JMH to avoid common benchmarking pitfalls.
public class FloatCompare {
@Benchmark
public float cmp() {
float num = ThreadLocalRandom.current().nextFloat() * 2 - 1;
return num < 0 ? 0 : num;
}
@Benchmark
public float mul() {
float num = ThreadLocalRandom.current().nextFloat() * 2 - 1;
return num * (num < 0 ? 0 : 1);
}
}
JMH还建议乘法代码要快得多:
JMH also suggests that the multiplication code is a way faster:
Benchmark Mode Cnt Score Error Units
FloatCompare.cmp avgt 5 12,940 ± 0,166 ns/op
FloatCompare.mul avgt 5 6,182 ± 0,101 ns/op
现在是时候参与 perfasm分析器(内置于JMH)中,以查看JIT编译器生成的程序集。这是输出中最重要的部分(注释是我的):
Now it's time to engage perfasm profiler (built into JMH) to see the assembly produced by JIT compiler. Here are the most important parts of the output (comments are mine):
cmp
方法:
5,65% │││ 0x0000000002e717d0: vxorps xmm1,xmm1,xmm1 ; xmm1 := 0
0,28% │││ 0x0000000002e717d4: vucomiss xmm1,xmm0 ; compare num < 0 ?
4,25% │╰│ 0x0000000002e717d8: jbe 2e71720h ; jump if num >= 0
9,77% │ ╰ 0x0000000002e717de: jmp 2e71711h ; jump if num < 0
mul
方法:
1,59% ││ 0x000000000321f90c: vxorps xmm1,xmm1,xmm1 ; xmm1 := 0
3,80% ││ 0x000000000321f910: mov r11d,1h ; r11d := 1
││ 0x000000000321f916: xor r8d,r8d ; r8d := 0
││ 0x000000000321f919: vucomiss xmm1,xmm0 ; compare num < 0 ?
2,23% ││ 0x000000000321f91d: cmovnbe r11d,r8d ; r11d := r8d if num < 0
5,06% ││ 0x000000000321f921: vcvtsi2ss xmm1,xmm1,r11d ; xmm1 := (float) r11d
7,04% ││ 0x000000000321f926: vmulss xmm0,xmm1,xmm0 ; multiply
主要区别是 mul
方法。而是使用条件移动指令 cmovnbe
。
The key difference is that there's no jump instructions in the mul
method. Instead, conditional move instruction cmovnbe
is used.
cmov
使用整数寄存器。由于(num< 0?0:1)
表达式的右侧使用整数常量,因此JIT足够聪明,可以发出条件移动而不是条件跳转。
cmov
works with integer registers. Since (num < 0 ? 0 : 1)
expression uses integer constants on the right side, JIT is smart enough to emit a conditional move instead of a conditional jump.
在此基准测试中,条件跳转非常低效,因为分支预测经常由于以下原因而失败:数字的随机性。这就是为什么 mul
方法的无分支代码显示得更快的原因。
In this benchmark, conditional jump is very inefficient, since branch prediction often fails due to random nature of numbers. That's why the branchless code of mul
method appears faster.
如果我们以某个分支优先于另一个分支的方式修改基准,例如,通过替换
If we modify the benchmark in a way that one branch prevails over another, e.g by replacing
ThreadLocalRandom.current().nextFloat() * 2 - 1
with
ThreadLocalRandom.current().nextFloat() * 2 - 0.1f
则分支预测会更好,并且 cmp
方法将变得与 mul
:
then the branch prediction will work better, and cmp
method will become as fast as mul
:
Benchmark Mode Cnt Score Error Units
FloatCompare.cmp avgt 5 5,793 ± 0,045 ns/op
FloatCompare.mul avgt 5 5,764 ± 0,048 ns/op
这篇关于为什么带有两个常量的三元运算符比带有变量的三进制运算符快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!