布尔值(p ^ q)和(p!= q)之间有有用的区别吗? [英] Is there a useful difference between (p ^ q) and (p != q) for booleans?

查看:83
本文介绍了布尔值(p ^ q)和(p!= q)之间有有用的区别吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java有两种检查两个布尔值是否不同的方法.您可以将它们与!= 或与 ^ (异或)进行比较.当然,这两个运算符在所有情况下都会产生相同的结果.尽管如此,将两者都包含在内还是有道理的,例如在 XOR和NOT-EQUAL-之间有什么区别?到?.对于开发人员而言,根据上下文选择一个相对于另一个更有意义-有时正好是这些布尔值之一"读起来更好,而有时这两个布尔值不同"则可以更好地传达意图.因此,也许使用哪个应该是口味和风格的问题.

Java has two ways of checking whether two booleans differ. You can compare them with !=, or with ^ (xor). Of course, these two operators produce the same result in all cases. Still, it makes sense for both of them to be included, as discussed, for example, in What's the difference between XOR and NOT-EQUAL-TO?. It even makes sense for developers to prefer one over the other depending on context - sometimes "is exactly one of these booleans true" reads better, and other times "are these two booleans different" communicates intent better. So, perhaps which one to use should be a matter of taste and style.

令我惊讶的是javac并没有完全相同地对待它们!考虑这个课程:

What surprised me is that javac does not treat these identically! Consider this class:

class Test {
  public boolean xor(boolean p, boolean q) {
    return p ^ q;
  }
  public boolean inequal(boolean p, boolean q) {
    return p != q;
  }
}

显然,这两种方法具有相同的可见行为.但是它们具有不同的字节码:

Obviously, the two methods have the same visible behavior. But they have different bytecode:

$ javap -c Test
Compiled from "Test.java"
class Test {
  Test();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public boolean xor(boolean, boolean);
    Code:
       0: iload_1
       1: iload_2
       2: ixor
       3: ireturn

  public boolean inequal(boolean, boolean);
    Code:
       0: iload_1
       1: iload_2
       2: if_icmpeq     9
       5: iconst_1
       6: goto          10
       9: iconst_0
      10: ireturn
}

如果我不得不猜测,我会说 xor 的性能更好,因为它只返回比较的结果;增加跳跃和额外的负担似乎是浪费的工作.但是我没有猜测,而是使用Clojure的标准"基准测试工具对这两种方法的数十亿次调用进行了基准测试.足够接近,虽然xor看起来更快,但我对统计数据的了解还不够,无法说出结果是否显着:

If I had to guess, I'd say that xor performs better, since it just returns the result of its comparison; adding in a jump and an extra load just seems like wasted work. But instead of guessing, I benchmarked a few billion calls to both methods using Clojure's "criterium" benchmarking tool. It's close enough that while it looks like xor is a bit faster I'm not good enough at statistics to say whether the results are significant:

user=> (let [t (Test.)] (bench (.xor t true false)))
Evaluation count : 4681301040 in 60 samples of 78021684 calls.
             Execution time mean : 4.273428 ns
    Execution time std-deviation : 0.168423 ns
   Execution time lower quantile : 4.044192 ns ( 2.5%)
   Execution time upper quantile : 4.649796 ns (97.5%)
                   Overhead used : 8.723577 ns

Found 2 outliers in 60 samples (3.3333 %)
    low-severe   2 (3.3333 %)
 Variance from outliers : 25.4745 % Variance is moderately inflated by outliers
user=> (let [t (Test.)] (bench (.inequal t true false)))
Evaluation count : 4570766220 in 60 samples of 76179437 calls.
             Execution time mean : 4.492847 ns
    Execution time std-deviation : 0.162946 ns
   Execution time lower quantile : 4.282077 ns ( 2.5%)
   Execution time upper quantile : 4.813433 ns (97.5%)
                   Overhead used : 8.723577 ns

Found 2 outliers in 60 samples (3.3333 %)
    low-severe   2 (3.3333 %)
 Variance from outliers : 22.2554 % Variance is moderately inflated by outliers

出于性能方面的考虑,是否有某些理由更喜欢编写一个而不是另一个?在某些情况下,它们的实现方式上的差异使一种方法比另一种方法更适合?或者,有人知道为什么javac如此不同地实现这两个相同的操作吗?

Is there some reason to prefer writing one over the other, performance-wise1? Some context in which the difference in their implementation makes one more suitable than the other? Or, does anyone know why javac implements these two identical operations so differently?

1 当然,我不会鲁ck地使用此信息进行微优化.我很好奇这一切如何工作.

1 Of course, I will not recklessly use this information to micro-optimize. I'm just curious how this all works.

推荐答案

好吧,我将提供CPU如何立即翻译并更新帖子,但是与此同时,您看到的是waaaay差异太小了,无法照顾.

Well, I am going to provide how the CPU translates that shortly and update the post, but in the meanwhile, you are looking at waaaay too small difference to care.

字节代码并不表示方法将执行(或不执行)的速度,有两种JIT编译器一旦足够热就会使该方法看起来完全不同.同样,众所周知 javac 进行编译代码后几乎不会进行任何优化,真正的优化来自 JIT .

byte-code in java is not an indication of how fast (or not) a method will execute, there are two JIT compilers that will make this method look entirely different once they are hot enough. also javac is known to do very little optimizations once it compiles the code, the real optimizations come from JIT.

我为此使用 JMH 进行了一些测试,这些测试仅使用 C1 编译器或将 C2 替换为 GraalVM 或根本没有 JIT ...(后面有很多测试代码,您可以跳过它,只看结果,这是使用 jdk-12 btw完成的).这段代码使用 JMH -在Java世界中使用的事实上的工具微型基准(众所周知,如果手动完成,则容易出错).

I've put up some tests using JMH for this using either C1 compiler only or replacing C2 with GraalVM or no JIT at all... (lots of testing code follows, you can skip it and just look at the results, this is done using jdk-12 btw). This code is using JMH - the de facto tool to use in java world of micro-benchmarks (which are notoriously error-prone if done by hand).

@Warmup(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
public class BooleanCompare {

    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
            .include(BooleanCompare.class.getName())
            .build();

        new Runner(opt).run();
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(1)
    public boolean xor(BooleanExecutionPlan plan) {
        return plan.booleans()[0] ^ plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(1)
    public boolean plain(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1, jvmArgsAppend = "-Xint")
    public boolean xorNoJIT(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1, jvmArgsAppend = "-Xint")
    public boolean plainNoJIT(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1, jvmArgsAppend = "-XX:-TieredCompilation")
    public boolean xorC2Only(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1, jvmArgsAppend = "-XX:-TieredCompilation")
    public boolean plainC2Only(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1, jvmArgsAppend = "-XX:TieredStopAtLevel=1")
    public boolean xorC1Only(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1, jvmArgsAppend = "-XX:TieredStopAtLevel=1")
    public boolean plainC1Only(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1,
        jvmArgsAppend = {
            "-XX:+UnlockExperimentalVMOptions",
            "-XX:+EagerJVMCI",
            "-Dgraal.ShowConfiguration=info",
            "-XX:+UseJVMCICompiler",
            "-XX:+EnableJVMCI"
        })
    public boolean xorGraalVM(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1,
        jvmArgsAppend = {
            "-XX:+UnlockExperimentalVMOptions",
            "-XX:+EagerJVMCI",
            "-Dgraal.ShowConfiguration=info",
            "-XX:+UseJVMCICompiler",
            "-XX:+EnableJVMCI"
        })
    public boolean plainGraalVM(BooleanExecutionPlan plan) {
        return plan.booleans()[0] != plan.booleans()[1];
    }

}

结果:

BooleanCompare.plain         avgt    2    3.125          ns/op
BooleanCompare.xor           avgt    2    2.976          ns/op

BooleanCompare.plainC1Only   avgt    2    3.400          ns/op
BooleanCompare.xorC1Only     avgt    2    3.379          ns/op

BooleanCompare.plainC2Only   avgt    2    2.583          ns/op
BooleanCompare.xorC2Only     avgt    2    2.685          ns/op

BooleanCompare.plainGraalVM  avgt    2    2.980          ns/op
BooleanCompare.xorGraalVM    avgt    2    3.868          ns/op

BooleanCompare.plainNoJIT    avgt    2  243.348          ns/op
BooleanCompare.xorNoJIT      avgt    2  201.342          ns/op


虽然我有时喜欢这么做,但我不是一个能读懂汇编程序的多才多艺的人.这是一些有趣的事情.如果我们这样做:


I am not a versatile enough person to read assembler, though I sometimes like to do that... Here are some interesting things. If we do:

仅使用!=

/*
 * run many iterations of this with :
 *  java -XX:+UnlockDiagnosticVMOptions  
 *       -XX:TieredStopAtLevel=1  
 *       "-XX:CompileCommand=print,com/so/BooleanCompare.compare"  
 *       com.so.BooleanCompare
 */
public static boolean compare(boolean left, boolean right) {
    return left != right;
}

我们得到:

  0x000000010d1b2bc7: push   %rbp
  0x000000010d1b2bc8: sub    $0x30,%rsp  ;*iload_0 {reexecute=0 rethrow=0 return_oop=0}
                                         ; - com.so.BooleanCompare::compare@0 (line 22)

  0x000000010d1b2bcc: cmp    %edx,%esi
  0x000000010d1b2bce: mov    $0x0,%eax
  0x000000010d1b2bd3: je     0x000000010d1b2bde
  0x000000010d1b2bd9: mov    $0x1,%eax
  0x000000010d1b2bde: and    $0x1,%eax
  0x000000010d1b2be1: add    $0x30,%rsp
  0x000000010d1b2be5: pop    %rbp

对我来说,这段代码有点明显:将0放入 eax ,将 compare(edx,esi)->如果不相等,则将1放入 eax.返回 eax&1 .

To me, this code is a bit obvious: put 0 into eax, compare (edx, esi) -> if not equal put 1 into eax. return eax & 1.

带有^:的C1编译器

C1 compiler with ^:

public static boolean compare(boolean left, boolean right) {
     return left ^ right;
}



  # parm0:    rsi       = boolean
  # parm1:    rdx       = boolean
  #           [sp+0x40]  (sp of caller)
  0x000000011326e5c0: mov    %eax,-0x14000(%rsp)
  0x000000011326e5c7: push   %rbp
  0x000000011326e5c8: sub    $0x30,%rsp   ;*iload_0 {reexecute=0 rethrow=0 return_oop=0}
                                          ; - com.so.BooleanCompare::compare@0 (line 22)

  0x000000011326e5cc: xor    %rdx,%rsi
  0x000000011326e5cf: and    $0x1,%esi
  0x000000011326e5d2: mov    %rsi,%rax
  0x000000011326e5d5: add    $0x30,%rsp
  0x000000011326e5d9: pop    %rbp

我真的不知道为什么这里需要和$ 0x1,%esi ,否则我猜这也很简单.

I don't really know why and $0x1,%esi is needed here, otherwise this is fairly simple too, I guess.

但是,如果启用C2编译器,事情将会变得更加有趣.

But if I enable C2 compiler, things are a lot more interesting.

/**
 * run with java
 * -XX:+UnlockDiagnosticVMOptions
 * -XX:CICompilerCount=2
 * -XX:-TieredCompilation
 * "-XX:CompileCommand=print,com/so/BooleanCompare.compare"
 * com.so.BooleanCompare
 */
public static boolean compare(boolean left, boolean right) {
    return left != right;
}



  # parm0:    rsi       = boolean
  # parm1:    rdx       = boolean
  #           [sp+0x20]  (sp of caller)
  0x000000011a2bbfa0: sub    $0x18,%rsp
  0x000000011a2bbfa7: mov    %rbp,0x10(%rsp)                

  0x000000011a2bbfac: xor    %r10d,%r10d
  0x000000011a2bbfaf: mov    $0x1,%eax
  0x000000011a2bbfb4: cmp    %edx,%esi
  0x000000011a2bbfb6: cmove  %r10d,%eax                     

  0x000000011a2bbfba: add    $0x10,%rsp
  0x000000011a2bbfbe: pop    %rbp

我什至都看不到经典的结尾句 push ebp;mov ebp,esp;sub esp,x ,而是通过以下方式(至少对我而言)非常不寻常的东西:

I don't even see the classic epilog push ebp; mov ebp, esp; sub esp, x, instead something very un-usual (at least for me) via:

 sub    $0x18,%rsp
 mov    %rbp,0x10(%rsp)

 ....
 add    $0x10,%rsp
 pop    %rbp

再一次,比我更灵活的人可以充满希望地进行解释.否则,它就像生成的 C1 的更好版本:

Again, someone more versatile than me, can explain hopefully. Otherwise it's like a better version of the C1 generated:

xor    %r10d,%r10d // put zero into r10d
mov    $0x1,%eax   // put 1 into eax
cmp    %edx,%esi   // compare edx and esi
cmove  %r10d,%eax  // conditionally move the contents of r10d into eax

由于分支预测,

AFAIK cmp/cmove cmp/je 好-至少这是我读过的...

AFAIK cmp/cmove is better than cmp/je because of branch-prediction - this is at least what I've read...

使用C2编译器进行XOR:

XOR with C2 compiler:

public static boolean compare(boolean left, boolean right) {
    return left ^ right;
}



  0x000000010e6c9a20: sub    $0x18,%rsp
  0x000000010e6c9a27: mov    %rbp,0x10(%rsp)                

  0x000000010e6c9a2c: xor    %edx,%esi
  0x000000010e6c9a2e: mov    %esi,%eax
  0x000000010e6c9a30: and    $0x1,%eax
  0x000000010e6c9a33: add    $0x10,%rsp
  0x000000010e6c9a37: pop    %rbp

肯定看起来与生成的 C1 编译器几乎相同.

It sure looks like it's almost the same as C1 compiler generated.

这篇关于布尔值(p ^ q)和(p!= q)之间有有用的区别吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆