为什么这个代码在锁定时运行得更快? [英] Why does this code run faster with a lock?

查看:69
本文介绍了为什么这个代码在锁定时运行得更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一些背景知识:我创建了一个人为的例子来向我的团队展示VisualVM的使用。特别是,一种方法有一个不必要的 synchronized 关键字,我们看到线程池中的线程阻塞,它们不需要。但删除该关键字具有下面描述的令人惊讶的效果,下面的代码是最简单的情况我可以减少原始示例以重现问题,并使用 ReentrantLock 也会产生相同的效果。

Some background: I created a contrived example to demonstrate use of VisualVM to my team. In particular, one method had an unnecessary synchronized keyword, and we saw threads in the thread pool blocking, where they didn't need to be. But removing that keyword had the surprising effect described below, and the code below is the simplest case I can reduce that original example to in order to reproduce the issue, and using a ReentrantLock also creates the same effect.

请考虑以下代码( https://gist.github.com/revbingo/4c035aa29d3c7b50ed8b - 您需要将Commons Math 3.4.1添加到类路径中。它创建100个任务,并将它们提交给5个线程的线程池。在任务中,创建两个500x500随机值矩阵,然后相乘。

Please consider the code below (full runnable code example at https://gist.github.com/revbingo/4c035aa29d3c7b50ed8b - you need to add Commons Math 3.4.1 to the classpath). It creates 100 tasks, and submits them to a thread pool of 5 threads. In the task, two 500x500 matrices of random values are created, and then multiplied.

public class Main {
private static ExecutorService exec = Executors.newFixedThreadPool(5);

private final static int MATRIX_SIZE = 500;
private static UncorrelatedRandomVectorGenerator generator = 
            new UncorrelatedRandomVectorGenerator(MATRIX_SIZE, new StableRandomGenerator(new JDKRandomGenerator(), 0.1d, 1.0d));

private static ReentrantLock lock = new ReentrantLock();

public static void main(String[] args) throws Exception {

    for(int i=0; i < 100; i++) {

        exec.execute(new Runnable() {
            @Override
            public void run() {
                double[][] matrixArrayA = new double[MATRIX_SIZE][MATRIX_SIZE];
                double[][] matrixArrayB = new double[MATRIX_SIZE][MATRIX_SIZE];
                for(int j = 0; j< MATRIX_SIZE; j++) {
                    matrixArrayA[j] = generator.nextVector();
                    matrixArrayB[j] = generator.nextVector();
                }

                RealMatrix matrixA = MatrixUtils.createRealMatrix(matrixArrayA);
                RealMatrix matrixB = MatrixUtils.createRealMatrix(matrixArrayB);

                lock.lock();
                matrixA.multiply(matrixB);
                lock.unlock();
            }
        });
    }
}
}

ReentrantLock 实际上是不必要的。需要同步的线程之间没有共享状态。在锁定到位的情况下,我们期望观察线程池阻塞中的线程。删除锁后,我们预计不会再阻塞,并且所有线程都能够并行完全运行。

The ReentrantLock is actually unnecessary. There is no shared state between the threads that needs synchronization. With the lock in place, we expectedly observe the threads in the thread pool blocking. With the lock removed, we expectedly observe no more blocking, and all threads able to run fully in parallel.

删除锁的意外结果是代码始终如一更长在我的机器(四核i7)上完成15-25%。对代码进行概要分析表明线程中没有任何阻塞或等待的迹象,总CPU使用率仅为50%左右,相对均匀地分布在核心上。

The unexpected result of removing the lock is that the code consistently takes longer to complete, on my machine (quad-core i7) by 15-25%. Profiling the code shows no indication of any blocking or waiting in the threads, and total CPU usage is only around 50%, spread relatively evenly over the cores.

第二个出乎意料的事实是,这也取决于使用的生成器的类型。如果我使用 GaussianRandomGenerator UniformRandomGenerator 而不是 StableRandomGenerator ,观察到预期结果 - 通过删除 lock(),代码运行得更快(大约10%)。

The second unexpected thing is that this is also dependent on the type of generator that is used. If I use a GaussianRandomGenerator or UniformRandomGenerator instead of the StableRandomGenerator, the expected result is observed - the code runs faster (by around 10%) by removing the lock().

如果线程没有阻塞,CPU处于合理的水平,并且没有涉及IO,如何解释?我真正有的唯一线索就是 StableRandomGenerator 会调用很多三角函数,所以显然比高斯或统一生成器的CPU密集程度要高得多,但为什么呢?我没有看到CPU被最大化?

If threads are not blocking, the CPU is at a reasonable level, and there is no IO involved, how can this be explained? The only clue I really have is that the StableRandomGenerator does invoke a lot of trigonometric functions, so is clearly a lot more CPU intensive than the Gaussian or Uniform generators, but why then am I not seeing the CPU being maxed out?

编辑:另一个重点(感谢Joop) - 制作生成器 Runnable的本地(即每个线程一个)显示正常的预期行为,其中添加锁会使代码减慢大约50%。因此奇怪行为的关键条件是a)使用 StableRandomGenerator ,以及b)在线程之间共享该生成器。但据我所知,该生成器是线程安全的。

Another important point (thanks Joop) - making generator local to the Runnable (i.e. one per thread) displays the normal expected behaviour, where adding the lock slows the code by around 50%. So the key conditions for the odd behaviour are a) using a StableRandomGenerator, and b) having that generator be shared between the threads. But to the best of my knowledge, that generator is thread-safe.

EDIT2:虽然这个问题表面上非常类似于链接的副本问题,答案似乎是合理的,几乎可以肯定是一个因素,我还不相信它就像那样简单。让我质疑的事情:

Whilst this question is superficially very similar to the linked duplicate question, and the answer is plausible and almost certainly a factor, I'm yet to be convinced it's quite as simple as that. Things that make me question it:

1)问题只能通过同步 multiply()操作来显示,它不会对随机进行任何调用。我的直接想法是,同步最终会在某种程度上错开线程,因此意外地提高了 Random#next()的性能。但是,同步调用 generator.nextVector()(理论上它具有相同的效果,以正确的方式),不会重现问题 - 同步速度慢您可能期望的代码。

1) The problem is only shown by synchronizing on the multiply() operation, which does not make any calls to Random. My immediate thought was that that synchronization ends up staggering the threads to some extent, and therefore "accidentally" improves the performance of Random#next(). However, synchronizing on the calls to generator.nextVector() (which in theory has the same effect, in the "proper" way), does not reproduce the issue - synchronizing slows the code as you might expect.

2)仅在 StableRandomGenerator 中观察到问题,即使其他 NormalizedRandomGenerator 的实现也使用 JDKRandomGenerator (正如指出的那样只包含 java .util.Random )。实际上,我替换了使用 RandomVectorGenerator ,并通过直接调用 Random#nextDouble 来填充矩阵,以及行为再次恢复到预期的结果 - 同步代码的任何部分会导致总吞吐量下降。

2) The problem is only observed with the StableRandomGenerator, even though the other implementations of NormalizedRandomGenerator also use the JDKRandomGenerator (which as pointed out is just a wrapped for java.util.Random). In fact, I replaced use of the RandomVectorGenerator with filling in the matrices with direct calls to Random#nextDouble, and behaviour again reverts to the expected result - synchronizing any part of the code causes the total throughput to drop.

总之,问题可以观察

a)使用 StableRandomGenerator - 没有 NormalizedRandomGenerator的其他子类,也不直接使用 JDKRandomGenerator java.util.Random ,显示相同的行为。

a) using StableRandomGenerator - no other subclass of NormalizedRandomGenerator, nor using the JDKRandomGenerator or java.util.Random directly, display the same behaviour.

b)将呼叫同步到 RealMatrix #sultily 。同步调用随机生成器时,未观察到相同的行为。

b) synchronizing the call to RealMatrix#multiply. The same behaviour is not observed when synchronizing the calls to the random generator.

推荐答案

此处

您实际上是在测量具有共享状态的PRNG内部的争用。

You're actually measuring the contention inside a PRNG with a shared state.

JDKRandomGenerator 基于 java.util.Random 种子在所有工作线程之间共享。线程竞争在种子 share / classes / java / util / Random.java#l183rel =nofollow noreferrer>比较和设置循环

JDKRandomGenerator is based on java.util.Random which has seed shared among all your worker threads. The threads compete to update seed in the compare-and-set loop.

为什么 lock 提高性能呢?实际上,通过序列化工作有助于减少 java.util.Random 中的争用:当一个线程执行矩阵乘法时,另一个线程用随机数填充矩阵。没有 lock 线程同时执行相同的工作。

Why lock improves performance then? In fact, it helps to reduce contention inside java.util.Random by serializing the work: while one thread performs matrix multiplication, the other is filling the matrix with random numbers. Without lock threads do the same work concurrently.

这篇关于为什么这个代码在锁定时运行得更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆