为什么带有两个常量的三元运算符比带有变量的三进制运算符快? [英] Why is a ternary operator with two constants faster than one with a variable?

查看:91
本文介绍了为什么带有两个常量的三元运算符比带有变量的三进制运算符快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java中,我有两个不同的语句,它们通过使用三元运算符来实现相同的结果,如下所示:

In Java, I have two different statements which accomplish the same result through using ternary operators, which are as follows:


  1. num< 0? 0:num;

  2. num *(num< 0?0:1);

  1. num < 0 ? 0 : num;
  2. num * (num < 0 ? 0 : 1);

似乎第二个语句不必要地复杂,并且会花费比第一个更长的时间,但是当我记录每个语句所花费的时间时,使用下面的代码,结果如下:

It appears that the second statement is unnecessarily complex and would take longer than the first, however when I recorded the time that each took, using the following code, the results were as follows:

final long startTime = System.currentTimeMillis();

Random rand = new Random();
float[] results = new float[100000000];
for (int i = 0; i < 100000000; i++) {
    float num = (rand.nextFloat() * 2) - 1;
    results[i] = num < 0 ? 0 : num;
    //results[i] = num * (num < 0 ? 0 : 1);
}

final long endTime = System.currentTimeMillis();

System.out.println("Total Time: " + (endTime - startTime));




  1. 1.232秒

  2. 1.023秒
    (每次平均运行5次以上)

为什么使用第二条语句时会有这么大的加速?它似乎包括不必要的乘法,并且具有相同的比较。第一个创建分支,第二个没有吗?

Why is there this significant speedup when using the second statement? It seems to include an unnecessary multiplication and have the same comparison. Does the first create a branch whilst the second does not?

推荐答案

首先,让我们用 JMH 以避免

First, let's rewrite the benchmark with JMH to avoid common benchmarking pitfalls.

public class FloatCompare {

    @Benchmark
    public float cmp() {
        float num = ThreadLocalRandom.current().nextFloat() * 2 - 1;
        return num < 0 ? 0 : num;
    }

    @Benchmark
    public float mul() {
        float num = ThreadLocalRandom.current().nextFloat() * 2 - 1;
        return num * (num < 0 ? 0 : 1);
    }
}

JMH还建议乘法代码要快得多:

JMH also suggests that the multiplication code is a way faster:

Benchmark         Mode  Cnt   Score   Error  Units
FloatCompare.cmp  avgt    5  12,940 ± 0,166  ns/op
FloatCompare.mul  avgt    5   6,182 ± 0,101  ns/op

现在是时候参与 perfasm分析器(内置于JMH)中,以查看JIT编译器生成的程序集。这是输出中最重要的部分(注释是我的):

Now it's time to engage perfasm profiler (built into JMH) to see the assembly produced by JIT compiler. Here are the most important parts of the output (comments are mine):

cmp 方法:

  5,65%  │││  0x0000000002e717d0: vxorps  xmm1,xmm1,xmm1  ; xmm1 := 0
  0,28%  │││  0x0000000002e717d4: vucomiss xmm1,xmm0      ; compare num < 0 ?
  4,25%  │╰│  0x0000000002e717d8: jbe     2e71720h        ; jump if num >= 0
  9,77%  │ ╰  0x0000000002e717de: jmp     2e71711h        ; jump if num < 0

mul 方法:

  1,59%  ││  0x000000000321f90c: vxorps  xmm1,xmm1,xmm1    ; xmm1 := 0
  3,80%  ││  0x000000000321f910: mov     r11d,1h           ; r11d := 1
         ││  0x000000000321f916: xor     r8d,r8d           ; r8d := 0
         ││  0x000000000321f919: vucomiss xmm1,xmm0        ; compare num < 0 ?
  2,23%  ││  0x000000000321f91d: cmovnbe r11d,r8d          ; r11d := r8d if num < 0
  5,06%  ││  0x000000000321f921: vcvtsi2ss xmm1,xmm1,r11d  ; xmm1 := (float) r11d
  7,04%  ││  0x000000000321f926: vmulss  xmm0,xmm1,xmm0    ; multiply

主要区别是 mul 方法。而是使用条件移动指令 cmovnbe

The key difference is that there's no jump instructions in the mul method. Instead, conditional move instruction cmovnbe is used.

cmov 使用整数寄存器。由于(num< 0?0:1)表达式的右侧使用整数常量,因此JIT足够聪明,可以发出条件移动而不是条件跳转。

cmov works with integer registers. Since (num < 0 ? 0 : 1) expression uses integer constants on the right side, JIT is smart enough to emit a conditional move instead of a conditional jump.

在此基准测试中,条件跳转非常低效,因为分支预测经常由于以下原因而失败:数字的随机性。这就是为什么 mul 方法的无分支代码显示得更快的原因。

In this benchmark, conditional jump is very inefficient, since branch prediction often fails due to random nature of numbers. That's why the branchless code of mul method appears faster.

如果我们以某个分支优先于另一个分支的方式修改基准,例如,通过替换

If we modify the benchmark in a way that one branch prevails over another, e.g by replacing

ThreadLocalRandom.current().nextFloat() * 2 - 1

with

ThreadLocalRandom.current().nextFloat() * 2 - 0.1f

则分支预测会更好,并且 cmp 方法将变得与 mul

then the branch prediction will work better, and cmp method will become as fast as mul:

Benchmark         Mode  Cnt  Score   Error  Units
FloatCompare.cmp  avgt    5  5,793 ± 0,045  ns/op
FloatCompare.mul  avgt    5  5,764 ± 0,048  ns/op

这篇关于为什么带有两个常量的三元运算符比带有变量的三进制运算符快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆