通过JMH在sun.misc.Unsafe.compareAndSwap测量中的奇怪行为 [英] Strange behavior in sun.misc.Unsafe.compareAndSwap measurement via JMH

查看:72
本文介绍了通过JMH在sun.misc.Unsafe.compareAndSwap测量中的奇怪行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我决定使用不同的锁定策略并使用JMH来测量增量.我正在使用JMH来检查吞吐量和平均时间,以及用于检查正确性的简单自定义测试.有六种策略:

I've decided to measure incrementation with different locking strategies and using JMH for this purpose. I'm using JMH for checking throughput and average time as well as simple custom test for checking correctness. There are six strategies:

  • 原子数
  • ReadWrite锁定计数
  • 与volatile同步
  • 没有挥发的同步块
  • sun.misc.Unsafe.compareAndSwap
  • sun.misc.Unsafe.getAndAdd
  • 不同步计数

基准代码:

@State(Scope.Benchmark)
@BenchmarkMode({Mode.Throughput, Mode.AverageTime})
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(1)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
public class UnsafeCounter_Benchmark {
    public Counter unsync, syncNoV, syncV, lock, atomic, unsafe, unsafeGA;

    @Setup(Level.Iteration)
    public void prepare() {
        unsync = new UnsyncCounter();
        syncNoV = new SyncNoVolatileCounter();
        syncV = new SyncVolatileCounter();
        lock = new LockCounter();
        atomic = new AtomicCounter();
        unsafe = new UnsafeCASCounter();
        unsafeGA = new UnsafeGACounter();
    }

    @Benchmark
    public void unsyncCount() {
        unsyncCounter();
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void unsyncCounter() {
        unsync.increment();
    }

    @Benchmark
    public void syncNoVCount() {
        syncNoVCounter();
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void syncNoVCounter() {
        syncNoV.increment();
    }

    @Benchmark
    public void syncVCount() {
        syncVCounter();
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void syncVCounter() {
        syncV.increment();
    }

    @Benchmark
    public void lockCount() {
        lockCounter();
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void lockCounter() {
        lock.increment();
    }

    @Benchmark
    public void atomicCount() {
        atomicCounter();
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void atomicCounter() {
        atomic.increment();
    }

    @Benchmark
    public void unsafeCount() {
        unsafeCounter();
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void unsafeCounter() {
        unsafe.increment();
    }

    @Benchmark
    public void unsafeGACount() {
        unsafeGACounter();
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void unsafeGACounter() {
        unsafeGA.increment();
    }

    public static void main(String[] args) throws RunnerException {
        Options baseOpts = new OptionsBuilder()
                .include(UnsafeCounter_Benchmark.class.getSimpleName())
                .threads(100)
                .jvmArgs("-ea")
                .build();

        new Runner(baseOpts).run();
    }
}

和替补席的结果:

JDK 8u20

Benchmark                                         Mode  Samples   Score    Error   Units
o.k.u.u.UnsafeCounter_Benchmark.atomicCount      thrpt        5  42.178 ± 17.643  ops/us
o.k.u.u.UnsafeCounter_Benchmark.lockCount        thrpt        5  24.044 ±  2.264  ops/us
o.k.u.u.UnsafeCounter_Benchmark.syncNoVCount     thrpt        5  22.849 ±  1.344  ops/us
o.k.u.u.UnsafeCounter_Benchmark.syncVCount       thrpt        5  20.235 ±  2.027  ops/us
o.k.u.u.UnsafeCounter_Benchmark.unsafeCount      thrpt        5  12.460 ±  1.326  ops/us
o.k.u.u.UnsafeCounter_Benchmark.unsafeGACount    thrpt        5  39.106 ±  2.966  ops/us
o.k.u.u.UnsafeCounter_Benchmark.unsyncCount      thrpt        5  93.076 ±  9.674  ops/us
o.k.u.u.UnsafeCounter_Benchmark.atomicCount       avgt        5   2.604 ±  0.133   us/op
o.k.u.u.UnsafeCounter_Benchmark.lockCount         avgt        5   4.161 ±  0.546   us/op
o.k.u.u.UnsafeCounter_Benchmark.syncNoVCount      avgt        5   4.440 ±  0.523   us/op
o.k.u.u.UnsafeCounter_Benchmark.syncVCount        avgt        5   5.073 ±  0.439   us/op
o.k.u.u.UnsafeCounter_Benchmark.unsafeCount       avgt        5   9.088 ±  5.964   us/op
o.k.u.u.UnsafeCounter_Benchmark.unsafeGACount     avgt        5   2.611 ±  0.164   us/op
o.k.u.u.UnsafeCounter_Benchmark.unsyncCount       avgt        5   1.047 ±  0.050   us/op

我期望的最大测量结果,除了 UnsafeCounter_Benchmark.unsafeCount ,它与 sun.misc.Unsafe.compareAndSwapLong while 循环一起使用.这是最慢的锁定.

The most of measurement as I expect, except UnsafeCounter_Benchmark.unsafeCount which is used sun.misc.Unsafe.compareAndSwapLong with while loop. It the the slowest locking.

public void increment() {
    long before = counter;
    while (!unsafe.compareAndSwapLong(this, offset, before, before + 1L)) {
        before = counter;
    }
}

我认为性能低下是由于while循环和JMH引起了更高的争用,但是当我由 Executors 检查了正确性后,我得到了预期的数字:

I suggest that low performance is because of while loop and JMH makes higher contention, but when I've checked correctness by Executors I get figures as I expect:

Counter result: UnsyncCounter 97538676
Time passed in ms:259
Counter result: AtomicCounter 100000000
Time passed in ms:1805
Counter result: LockCounter 100000000
Time passed in ms:3904
Counter result: SyncNoVolatileCounter 100000000
Time passed in ms:14227
Counter result: SyncVolatileCounter 100000000
Time passed in ms:19224
Counter result: UnsafeCASCounter 100000000
Time passed in ms:8077
Counter result: UnsafeGACounter 100000000
Time passed in ms:2549

正确性测试代码:

public class UnsafeCounter_Test {
    static class CounterClient implements Runnable {
        private Counter c;
        private int num;

        public CounterClient(Counter c, int num) {
            this.c = c;
            this.num = num;
        }

        @Override
        public void run() {
            for (int i = 0; i < num; i++) {
                c.increment();
            }
        }
    }

    public static void makeTest(Counter counter) throws InterruptedException {
        int NUM_OF_THREADS = 1000;
        int NUM_OF_INCREMENTS = 100000;
        ExecutorService service = Executors.newFixedThreadPool(NUM_OF_THREADS);
        long before = System.currentTimeMillis();
        for (int i = 0; i < NUM_OF_THREADS; i++) {
            service.submit(new CounterClient(counter, NUM_OF_INCREMENTS));
        }
        service.shutdown();
        service.awaitTermination(1, TimeUnit.MINUTES);
        long after = System.currentTimeMillis();
        System.out.println("Counter result: " + counter.getClass().getSimpleName() + " " + counter.getCounter());
        System.out.println("Time passed in ms:" + (after - before));
    }

    public static void main(String[] args) throws InterruptedException {
        makeTest(new UnsyncCounter());
        makeTest(new AtomicCounter());
        makeTest(new LockCounter());
        makeTest(new SyncNoVolatileCounter());
        makeTest(new SyncVolatileCounter());
        makeTest(new UnsafeCASCounter());
        makeTest(new UnsafeGACounter());
    }
}

我知道这是一个非常糟糕的测试,但是在这种情况下,不安全的CAS比Sync变体快两倍,并且一切都按预期进行.有人可以澄清所描述的行为吗?有关更多信息,请参见GitHub存储库:

I know that it is very awful test, but in this case Unsafe CAS two times faster than Sync variants and everything goes as expected. Could somebody clarify described behavior? For more information please see GitHub repo: Bench, Unsafe CAS counter

推荐答案

大声思考:值得注意的是,人们经常完成90%的乏味工作,而将10%(从乐趣开始的地方)留给别人!好吧,我正在享受所有的乐趣!

Thinking out loud: it is remarkable how often people do 90% of the tedious work, and leave the 10% (where the fun begins) for someone else! All right, I'm taking all the fun!

让我首先在i7-4790K(8u40 EA)上重复该实验:

Let me repeat the experiment first on my i7-4790K, 8u40 EA:

Benchmark                                 Mode  Samples    Score    Error   Units
UnsafeCounter_Benchmark.atomicCount      thrpt        5   47.669 ± 18.440  ops/us
UnsafeCounter_Benchmark.lockCount        thrpt        5   14.497 ±  7.815  ops/us
UnsafeCounter_Benchmark.syncNoVCount     thrpt        5   11.618 ±  2.130  ops/us
UnsafeCounter_Benchmark.syncVCount       thrpt        5   11.337 ±  4.532  ops/us
UnsafeCounter_Benchmark.unsafeCount      thrpt        5    7.452 ±  1.042  ops/us
UnsafeCounter_Benchmark.unsafeGACount    thrpt        5   43.332 ±  3.435  ops/us
UnsafeCounter_Benchmark.unsyncCount      thrpt        5  102.773 ± 11.943  ops/us

确实,关于 unsafeCount 测试似乎有些可疑.确实,您必须先验证所有数据,然后才能对其进行验证.对于nanobenchmark,您必须验证生成的代码,以查看是否实际测量了要测量的东西.在JMH中,使用 -prof perfasm 可以非常迅速地实现.实际上,如果您查看那里最热的 unsafeCount 区域,您会发现一些有趣的事情:

Truly, something seems fishy about unsafeCount test. Really, you have to presume all data is fishy before you validated it. For nanobenchmarks, you have to validate the generated code to see if you actually measure something you want to measure. In JMH, it is very quickly doable with -prof perfasm. In fact, if you look at the hottest region of unsafeCount there, you will notice a few funny things:

  0.12%    0.04%    0x00007fb45518e7d1: mov    0x10(%r10),%rax    
 17.03%   23.44%    0x00007fb45518e7d5: test   %eax,0x17318825(%rip)
  0.21%    0.07%    0x00007fb45518e7db: mov    0x18(%r10),%r11    ; getfield offset
 30.33%   10.77%    0x00007fb45518e7df: mov    %rax,%r8
  0.00%             0x00007fb45518e7e2: add    $0x1,%r8           
  0.01%             0x00007fb45518e7e6: cmp    0xc(%r10),%r12d    ; typecheck 
                    0x00007fb45518e7ea: je     0x00007fb45518e80b ; bail to v-call
  0.83%    0.48%    0x00007fb45518e7ec: lock cmpxchg %r8,(%r10,%r11,1)
 33.27%   25.52%    0x00007fb45518e7f2: sete   %r8b
  0.12%    0.01%    0x00007fb45518e7f6: movzbl %r8b,%r8d          
  0.03%    0.04%    0x00007fb45518e7fa: test   %r8d,%r8d
                    0x00007fb45518e7fd: je     0x00007fb45518e7d1 ; back branch

翻译:a)每次迭代都会重新读取 offset 字段-因为CAS内存效应意味着易失性读取,因此需要悲观地重新读取该字段;b)有趣的是,出于相同的原因, unsafe 字段也被 重新读取-出于相同的原因.

Translation: a) offset field gets re-read on each iteration -- because CAS memory effects imply volatile read, and therefore the field needs to be pessimistically re-read; b) the hilarious part is that unsafe field is also being re-read for a typecheck -- for the same reason.

这就是为什么高性能代码应如下所示:

This is why high-performance code should look like this:

--- a/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java       
+++ b/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java       
@@ -5,13 +5,13 @@ import sun.misc.Unsafe;

 public class UnsafeCASCounter implements Counter {
     private volatile long counter = 0;
-    private final Unsafe unsafe = UnsafeHelper.unsafe;
-    private long offset;
-    {
+    private static final Unsafe unsafe = UnsafeHelper.unsafe;
+    private static final long offset;
+    static {
         try {
             offset = unsafe.objectFieldOffset(UnsafeCASCounter.class.getDeclaredField("counter"));
         } catch (NoSuchFieldException e) {
-            e.printStackTrace();
+            throw new IllegalStateException("Whoops!");
         }
     }

如果这样做, unsafeCount 的性能将立即得到提升:

If you do that, the unsafeCount performance boosts right up:

Benchmark                              Mode  Samples   Score    Error   Units
UnsafeCounter_Benchmark.unsafeCount    thrpt        5  9.733 ± 0.673  ops/us

...鉴于误差范围,它现在非常接近同步测试.如果您现在查看 -prof perfasm ,这是一个 unsafeCount 循环:

...which is fairly close to synchronized tests now, given the error bounds. If you look at the -prof perfasm now, this is an unsafeCount loop:

  0.08%    0.02%    0x00007f7575191900: mov    0x10(%r10),%rax       
 28.09%   28.64%    0x00007f7575191904: test   %eax,0x161286f6(%rip) 
  0.23%    0.08%    0x00007f757519190a: mov    %rax,%r11
                    0x00007f757519190d: add    $0x1,%r11
                    0x00007f7575191911: lock cmpxchg %r11,0x10(%r10)
 47.27%   23.48%    0x00007f7575191917: sete   %r8b
  0.10%             0x00007f757519191b: movzbl %r8b,%r8d        
  0.02%             0x00007f757519191f: test   %r8d,%r8d
                    0x00007f7575191922: je     0x00007f7575191900  

此循环非常紧密,似乎没有什么可以使它运行得更快.我们花费大部分时间来加载更新的"值并实际对其进行CAS-ing.但是我们竞争很多!为了弄清楚争用是否是主要原因,让我们添加一些退避:

This loop is very tight, and it seems nothing can make it go faster. We spend most of the time loading the "updated" value and actually CAS-ing it. But we contend a lot! To figure out if contention is the leading cause, let's add backoffs:

--- a/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java       
+++ b/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java       
@@ -20,6 +21,7 @@ public class UnsafeCASCounter implements Counter {
         long before = counter;
         while (!unsafe.compareAndSwapLong(this, offset, before, before + 1L)) {
             before = counter;
+            Blackhole.consumeCPU(1000);
         }
     }

...运行中:

Benchmark                                 Mode  Samples    Score    Error   Units
UnsafeCounter_Benchmark.unsafeCount      thrpt        5   99.869 ± 107.933  ops/us

Voila.我们在循环中进行了 more 的工作,但这使我们免于进行过多竞争.我曾尝试在"Nanotrusting the Nanotime" 中对此进行解释,返回那里,进一步了解基准测试方法,尤其是在衡量重量级运营时.这突显了整个实验的陷阱,而不仅仅是 unsafeCount .

Voila. We do more work in the loop, but it saves us from contending a lot. I tried to explain this before in "Nanotrusting the Nanotime", it might be good to go back there and read up more on benchmarking methodology, especially when heavy-weight operations are measured. This highlights the pitfall in the entire experiment, not only with unsafeCount.

针对OP的锻炼和感兴趣的读者:解释为什么 unsafeGACount atomicCount 的执行速度比其他测试快得多.您现在有了工具.

Exercise for the OP and interested readers: explain why unsafeGACount and atomicCount perform much faster than other tests. You have the tools now.

P.S.在具有C(C< N)个线程的计算机上运行N个线程是很愚蠢的:您可能认为您对N个线程有争用",但是您只在运行和竞争" C个线程.当人们在4核计算机上执行1000个线程时,这尤其有趣.

P.S. Running N threads on machine having C (C < N) threads is silly: you might think you have "contention" with N threads, but instead you are running and "contending" C threads only. It is especially amusing when people do 1000 threads on 4 core machine...

P.P.S.时间检查:10分钟进行性能分析和其他实验,20分钟将其编写.您浪费了多少时间手动复制结果?;)

P.P.S. Time check: 10 minutes to do the profiling and additional experiments, 20 minutes to write it up. And how much time you wasted replicating the result by hand? ;)

这篇关于通过JMH在sun.misc.Unsafe.compareAndSwap测量中的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆