布尔值(p ^ q)和(p!= q)之间有有用的区别吗? [英] Is there a useful difference between (p ^ q) and (p != q) for booleans?
问题描述
Java有两种检查两个布尔值是否不同的方法.您可以将它们与!=
或与 ^
(异或)进行比较.当然,这两个运算符在所有情况下都会产生相同的结果.尽管如此,将两者都包含在内还是有道理的,例如在 XOR和NOT-EQUAL-之间有什么区别?到?.对于开发人员而言,根据上下文选择一个相对于另一个更有意义-有时正好是这些布尔值之一"读起来更好,而有时这两个布尔值不同"则可以更好地传达意图.因此,也许使用哪个应该是口味和风格的问题.
Java has two ways of checking whether two booleans differ. You can compare them with !=
, or with ^
(xor). Of course, these two operators produce the same result in all cases. Still, it makes sense for both of them to be included, as discussed, for example, in What's the difference between XOR and NOT-EQUAL-TO?. It even makes sense for developers to prefer one over the other depending on context - sometimes "is exactly one of these booleans true" reads better, and other times "are these two booleans different" communicates intent better. So, perhaps which one to use should be a matter of taste and style.
令我惊讶的是javac并没有完全相同地对待它们!考虑这个课程:
What surprised me is that javac does not treat these identically! Consider this class:
class Test {
public boolean xor(boolean p, boolean q) {
return p ^ q;
}
public boolean inequal(boolean p, boolean q) {
return p != q;
}
}
显然,这两种方法具有相同的可见行为.但是它们具有不同的字节码:
Obviously, the two methods have the same visible behavior. But they have different bytecode:
$ javap -c Test
Compiled from "Test.java"
class Test {
Test();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public boolean xor(boolean, boolean);
Code:
0: iload_1
1: iload_2
2: ixor
3: ireturn
public boolean inequal(boolean, boolean);
Code:
0: iload_1
1: iload_2
2: if_icmpeq 9
5: iconst_1
6: goto 10
9: iconst_0
10: ireturn
}
如果我不得不猜测,我会说 xor
的性能更好,因为它只返回比较的结果;增加跳跃和额外的负担似乎是浪费的工作.但是我没有猜测,而是使用Clojure的标准"基准测试工具对这两种方法的数十亿次调用进行了基准测试.足够接近,虽然xor看起来更快,但我对统计数据的了解还不够,无法说出结果是否显着:
If I had to guess, I'd say that xor
performs better, since it just returns the result of its comparison; adding in a jump and an extra load just seems like wasted work. But instead of guessing, I benchmarked a few billion calls to both methods using Clojure's "criterium" benchmarking tool. It's close enough that while it looks like xor is a bit faster I'm not good enough at statistics to say whether the results are significant:
user=> (let [t (Test.)] (bench (.xor t true false)))
Evaluation count : 4681301040 in 60 samples of 78021684 calls.
Execution time mean : 4.273428 ns
Execution time std-deviation : 0.168423 ns
Execution time lower quantile : 4.044192 ns ( 2.5%)
Execution time upper quantile : 4.649796 ns (97.5%)
Overhead used : 8.723577 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 25.4745 % Variance is moderately inflated by outliers
user=> (let [t (Test.)] (bench (.inequal t true false)))
Evaluation count : 4570766220 in 60 samples of 76179437 calls.
Execution time mean : 4.492847 ns
Execution time std-deviation : 0.162946 ns
Execution time lower quantile : 4.282077 ns ( 2.5%)
Execution time upper quantile : 4.813433 ns (97.5%)
Overhead used : 8.723577 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 22.2554 % Variance is moderately inflated by outliers
出于性能方面的考虑,是否有某些理由更喜欢编写一个而不是另一个?在某些情况下,它们的实现方式上的差异使一种方法比另一种方法更适合?或者,有人知道为什么javac如此不同地实现这两个相同的操作吗?
Is there some reason to prefer writing one over the other, performance-wise1? Some context in which the difference in their implementation makes one more suitable than the other? Or, does anyone know why javac implements these two identical operations so differently?
1 当然,我不会鲁ck地使用此信息进行微优化.我很好奇这一切如何工作.
1 Of course, I will not recklessly use this information to micro-optimize. I'm just curious how this all works.
推荐答案
好吧,我将提供CPU如何立即翻译并更新帖子,但是与此同时,您看到的是waaaay差异太小了,无法照顾.
Well, I am going to provide how the CPU translates that shortly and update the post, but in the meanwhile, you are looking at waaaay too small difference to care.
字节代码并不表示方法将执行(或不执行)的速度,有两种JIT编译器一旦足够热就会使该方法看起来完全不同.同样,众所周知 javac
进行编译代码后几乎不会进行任何优化,真正的优化来自 JIT
.
byte-code in java is not an indication of how fast (or not) a method will execute, there are two JIT compilers that will make this method look entirely different once they are hot enough. also javac
is known to do very little optimizations once it compiles the code, the real optimizations come from JIT
.
我为此使用 JMH
进行了一些测试,这些测试仅使用 C1
编译器或将 C2
替换为 GraalVM
或根本没有 JIT
...(后面有很多测试代码,您可以跳过它,只看结果,这是使用 jdk-12
btw完成的).这段代码使用 JMH -在Java世界中使用的事实上的工具微型基准(众所周知,如果手动完成,则容易出错).
I've put up some tests using JMH
for this using either C1
compiler only or replacing C2
with GraalVM
or no JIT
at all... (lots of testing code follows, you can skip it and just look at the results, this is done using jdk-12
btw). This code is using JMH - the de facto tool to use in java world of micro-benchmarks (which are notoriously error-prone if done by hand).
@Warmup(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
public class BooleanCompare {
public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(BooleanCompare.class.getName())
.build();
new Runner(opt).run();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public boolean xor(BooleanExecutionPlan plan) {
return plan.booleans()[0] ^ plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public boolean plain(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-Xint")
public boolean xorNoJIT(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-Xint")
public boolean plainNoJIT(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:-TieredCompilation")
public boolean xorC2Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:-TieredCompilation")
public boolean plainC2Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:TieredStopAtLevel=1")
public boolean xorC1Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:TieredStopAtLevel=1")
public boolean plainC1Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1,
jvmArgsAppend = {
"-XX:+UnlockExperimentalVMOptions",
"-XX:+EagerJVMCI",
"-Dgraal.ShowConfiguration=info",
"-XX:+UseJVMCICompiler",
"-XX:+EnableJVMCI"
})
public boolean xorGraalVM(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1,
jvmArgsAppend = {
"-XX:+UnlockExperimentalVMOptions",
"-XX:+EagerJVMCI",
"-Dgraal.ShowConfiguration=info",
"-XX:+UseJVMCICompiler",
"-XX:+EnableJVMCI"
})
public boolean plainGraalVM(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
}
结果:
BooleanCompare.plain avgt 2 3.125 ns/op
BooleanCompare.xor avgt 2 2.976 ns/op
BooleanCompare.plainC1Only avgt 2 3.400 ns/op
BooleanCompare.xorC1Only avgt 2 3.379 ns/op
BooleanCompare.plainC2Only avgt 2 2.583 ns/op
BooleanCompare.xorC2Only avgt 2 2.685 ns/op
BooleanCompare.plainGraalVM avgt 2 2.980 ns/op
BooleanCompare.xorGraalVM avgt 2 3.868 ns/op
BooleanCompare.plainNoJIT avgt 2 243.348 ns/op
BooleanCompare.xorNoJIT avgt 2 201.342 ns/op
虽然我有时喜欢这么做,但我不是一个能读懂汇编程序的多才多艺的人.这是一些有趣的事情.如果我们这样做:
I am not a versatile enough person to read assembler, though I sometimes like to do that... Here are some interesting things. If we do:
仅使用!=
/*
* run many iterations of this with :
* java -XX:+UnlockDiagnosticVMOptions
* -XX:TieredStopAtLevel=1
* "-XX:CompileCommand=print,com/so/BooleanCompare.compare"
* com.so.BooleanCompare
*/
public static boolean compare(boolean left, boolean right) {
return left != right;
}
我们得到:
0x000000010d1b2bc7: push %rbp
0x000000010d1b2bc8: sub $0x30,%rsp ;*iload_0 {reexecute=0 rethrow=0 return_oop=0}
; - com.so.BooleanCompare::compare@0 (line 22)
0x000000010d1b2bcc: cmp %edx,%esi
0x000000010d1b2bce: mov $0x0,%eax
0x000000010d1b2bd3: je 0x000000010d1b2bde
0x000000010d1b2bd9: mov $0x1,%eax
0x000000010d1b2bde: and $0x1,%eax
0x000000010d1b2be1: add $0x30,%rsp
0x000000010d1b2be5: pop %rbp
对我来说,这段代码有点明显:将0放入 eax
,将 compare(edx,esi)
->如果不相等,则将1放入 eax
.返回 eax&1
.
To me, this code is a bit obvious: put 0 into eax
, compare (edx, esi)
-> if not equal put 1 into eax
. return eax & 1
.
带有^:的C1编译器
C1 compiler with ^:
public static boolean compare(boolean left, boolean right) {
return left ^ right;
}
# parm0: rsi = boolean
# parm1: rdx = boolean
# [sp+0x40] (sp of caller)
0x000000011326e5c0: mov %eax,-0x14000(%rsp)
0x000000011326e5c7: push %rbp
0x000000011326e5c8: sub $0x30,%rsp ;*iload_0 {reexecute=0 rethrow=0 return_oop=0}
; - com.so.BooleanCompare::compare@0 (line 22)
0x000000011326e5cc: xor %rdx,%rsi
0x000000011326e5cf: and $0x1,%esi
0x000000011326e5d2: mov %rsi,%rax
0x000000011326e5d5: add $0x30,%rsp
0x000000011326e5d9: pop %rbp
我真的不知道为什么这里需要和$ 0x1,%esi
,否则我猜这也很简单.
I don't really know why and $0x1,%esi
is needed here, otherwise this is fairly simple too, I guess.
但是,如果启用C2编译器,事情将会变得更加有趣.
But if I enable C2 compiler, things are a lot more interesting.
/**
* run with java
* -XX:+UnlockDiagnosticVMOptions
* -XX:CICompilerCount=2
* -XX:-TieredCompilation
* "-XX:CompileCommand=print,com/so/BooleanCompare.compare"
* com.so.BooleanCompare
*/
public static boolean compare(boolean left, boolean right) {
return left != right;
}
# parm0: rsi = boolean
# parm1: rdx = boolean
# [sp+0x20] (sp of caller)
0x000000011a2bbfa0: sub $0x18,%rsp
0x000000011a2bbfa7: mov %rbp,0x10(%rsp)
0x000000011a2bbfac: xor %r10d,%r10d
0x000000011a2bbfaf: mov $0x1,%eax
0x000000011a2bbfb4: cmp %edx,%esi
0x000000011a2bbfb6: cmove %r10d,%eax
0x000000011a2bbfba: add $0x10,%rsp
0x000000011a2bbfbe: pop %rbp
我什至都看不到经典的结尾句 push ebp;mov ebp,esp;sub esp,x
,而是通过以下方式(至少对我而言)非常不寻常的东西:
I don't even see the classic epilog push ebp; mov ebp, esp; sub esp, x
, instead something very un-usual (at least for me) via:
sub $0x18,%rsp
mov %rbp,0x10(%rsp)
....
add $0x10,%rsp
pop %rbp
再一次,比我更灵活的人可以充满希望地进行解释.否则,它就像生成的 C1
的更好版本:
Again, someone more versatile than me, can explain hopefully. Otherwise it's like a better version of the C1
generated:
xor %r10d,%r10d // put zero into r10d
mov $0x1,%eax // put 1 into eax
cmp %edx,%esi // compare edx and esi
cmove %r10d,%eax // conditionally move the contents of r10d into eax
由于分支预测,
AFAIK cmp/cmove
比 cmp/je
好-至少这是我读过的...
AFAIK cmp/cmove
is better than cmp/je
because of branch-prediction - this is at least what I've read...
使用C2编译器进行XOR:
XOR with C2 compiler:
public static boolean compare(boolean left, boolean right) {
return left ^ right;
}
0x000000010e6c9a20: sub $0x18,%rsp
0x000000010e6c9a27: mov %rbp,0x10(%rsp)
0x000000010e6c9a2c: xor %edx,%esi
0x000000010e6c9a2e: mov %esi,%eax
0x000000010e6c9a30: and $0x1,%eax
0x000000010e6c9a33: add $0x10,%rsp
0x000000010e6c9a37: pop %rbp
肯定看起来与生成的 C1
编译器几乎相同.
It sure looks like it's almost the same as C1
compiler generated.
这篇关于布尔值(p ^ q)和(p!= q)之间有有用的区别吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!