如果执行顺序（几乎）不受影响，如何在严重的性能下降中分配变量结果？ [英] How can assigning a variable result in a serious performance drop while the execution order is (nearly) untouched?

查看：119 发布时间：2018/12/25 21:41:32 java multithreading performance jvm volatile

本文介绍了如果执行顺序（几乎）不受影响，如何在严重的性能下降中分配变量结果？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当玩多线程时，我可以观察到一些与AtomicLong（以及使用它的类，例如java.util.Random）相关的意外但严重的性能问题，我目前没有解释。但是，我创建了一个简约示例，它基本上由两个类组成：一个类Container，它保存对volatile变量的引用;一个类DemoThread，它在线程执行期间对Container的实例进行操作。请注意，对Container和volatile long的引用是私有的，并且从不在线程之间共享（我知道这里不需要使用volatile，它仅用于演示目的） - 因此，DemoThread的多个实例应该完美运行在多处理器机器上并行，但由于某种原因，它们没有（完整的例子在这篇文章的底部）。

When playing around with multithreading, I could observe some unexpected but serious performance issues related to AtomicLong (and classes using it, such as java.util.Random), for which I currently have no explanation. However, I created a minimalistic example, which basically consists of two classes: a class "Container", which keeps a reference to a volatile variable, and a class "DemoThread", which operates on an instance of "Container" during thread execution. Note that the references to "Container" and the volatile long are private, and never shared between threads (I know that there's no need to use volatile here, it's just for demonstration purposes) - thus, multiple instances of "DemoThread" should run perfectly parallel on a multiprocessor machine, but for some reason, they do not (Complete example is at the bottom of this post).

private static class Container  {

    private volatile long value;

    public long getValue() {
        return value;
    }

    public final void set(long newValue) {
        value = newValue;
    }
}

private static class DemoThread extends Thread {

    private Container variable;

    public void prepare() {
        this.variable = new Container();
    }

    public void run() {
        for(int j = 0; j < 10000000; j++) {
            variable.set(variable.getValue() + System.nanoTime());
        }
    }
}

在我的测试中，我反复创建4个DemoThreads，然后启动并加入。每个循环的唯一区别是prepare（）被调用的时间（这显然是线程运行所必需的，否则会导致NullPointerException）：

During my test, I repeatedly create 4 DemoThreads, which are then started and joined. The only difference in each loop is the time when "prepare()" gets called (which is obviously required for the thread to run, as it otherwise would result in a NullPointerException):

DemoThread[] threads = new DemoThread[numberOfThreads];
    for(int j = 0; j < 100; j++) {
        boolean prepareAfterConstructor = j % 2 == 0;
        for(int i = 0; i < threads.length; i++) {
            threads[i] = new DemoThread();
            if(prepareAfterConstructor) threads[i].prepare();
        }

        for(int i = 0; i < threads.length; i++) {
            if(!prepareAfterConstructor) threads[i].prepare();
            threads[i].start();
        }
        joinThreads(threads);
    }

出于某种原因，如果在启动线程之前立即执行prepare（），它将花费两倍的时间来完成，即使没有volatile关键字，性能差异也很大，至少在我测试代码的两台机器和操作系统上。以下是简短摘要：

For some reason, if prepare() is executed immediately before starting the thread, it will take twice as more time to finish, and even without the "volatile" keyword, the performance differences were significant, at least on two of the machines and OS'es I tested the code. Here's a short summary:

Java版本：1.6.0_24

Java类版本：50.0

VM供应商：Sun Microsystems Inc.

VM版本：19.1-b02-334

虚拟机名称：Java HotSpot（TM）64位服务器虚拟机

操作系统名称：Mac OS X

操作系统Arch：x86_64

操作系统版本：10.6.5

处理器/内核：8

Java Version: 1.6.0_24
Java Class Version: 50.0
VM Vendor: Sun Microsystems Inc.
VM Version: 19.1-b02-334
VM Name: Java HotSpot(TM) 64-Bit Server VM
OS Name: Mac OS X
OS Arch: x86_64
OS Version: 10.6.5
Processors/Cores: 8

使用volatile关键字：

最终结果：

31979 ms。当实例化后调用prepare（）时。

96482 ms。在执行之前调用prepare（）时。

With volatile keyword:
Final results:
31979 ms. when prepare() was called after instantiation.
96482 ms. when prepare() was called before execution.

没有volatile关键字：

最终结果：

26009 ms。当实例化后调用prepare（）时。

35196 ms。在执行之前调用prepare（）时。

Without volatile keyword:
Final results:
26009 ms. when prepare() was called after instantiation.
35196 ms. when prepare() was called before execution.

Java版本：1.6.0_24

Java类版本：50.0

VM供应商：Sun Microsystems Inc.

VM版本：19.1-b02

VM名称：Java HotSpot（TM）64位服务器VM

操作系统名称：Windows 7

OS Arch：amd64

操作系统版本：6.1

处理器/核心：4

Java Version: 1.6.0_24
Java Class Version: 50.0
VM Vendor: Sun Microsystems Inc.
VM Version: 19.1-b02
VM Name: Java HotSpot(TM) 64-Bit Server VM
OS Name: Windows 7
OS Arch: amd64
OS Version: 6.1
Processors/Cores: 4

使用volatile关键字：

最终结果：

18120 ms。当实例化后调用prepare（）时。

36089 ms。在执行之前调用prepare（）时。

With volatile keyword:
Final results:
18120 ms. when prepare() was called after instantiation.
36089 ms. when prepare() was called before execution.

没有volatile关键字：

最终结果：

10115 ms。当实例化后调用prepare（）时。

10039 ms。在执行之前调用prepare（）时。

Without volatile keyword:
Final results:
10115 ms. when prepare() was called after instantiation.
10039 ms. when prepare() was called before execution.

Java版本：1.6.0_20

Java类版本：50.0

VM供应商：Sun Microsystems Inc.

VM版本：19.0-b09

VM名称：OpenJDK 64位服务器VM

操作系统名称：Linux

OS Arch：amd64

操作系统版本：2.6.32-28-generic

处理器/核心：4

Java Version: 1.6.0_20
Java Class Version: 50.0
VM Vendor: Sun Microsystems Inc.
VM Version: 19.0-b09
VM Name: OpenJDK 64-Bit Server VM
OS Name: Linux
OS Arch: amd64
OS Version: 2.6.32-28-generic
Processors/Cores: 4

使用volatile关键字：

最终结果：

45848 ms。当实例化后调用prepare（）时。

110754 ms。在执行之前调用prepare（）时。

With volatile keyword:
Final results:
45848 ms. when prepare() was called after instantiation.
110754 ms. when prepare() was called before execution.

没有volatile关键字：

最终结果：

37862 ms。当实例化后调用prepare（）时。

39357 ms。在执行之前调用prepare（）时。

Without volatile keyword:
Final results:
37862 ms. when prepare() was called after instantiation.
39357 ms. when prepare() was called before execution.

测试1 ，4个线程，在创建循环中设置变量

Thread-2在653 ms之后完成。

Thread-3在653 ms之后完成。

Thread-4已完成在653毫秒之后。

线程-5在653毫秒后完成。

总时间：654毫秒。

Test 1, 4 threads, setting variable in creation loop
Thread-2 completed after 653 ms.
Thread-3 completed after 653 ms.
Thread-4 completed after 653 ms.
Thread-5 completed after 653 ms.
Overall time: 654 ms.

测试2,4线程，在启动循环中设置变量

在1588 ms后完成Thread-7。

Thread- 6 1589 ms后完成6。

1593 ms后完成Thread-8。

1593 ms后完成Thread-9。

总时间：1594 ms。

Test 2, 4 threads, setting variable in start loop
Thread-7 completed after 1588 ms.
Thread-6 completed after 1589 ms.
Thread-8 completed after 1593 ms.
Thread-9 completed after 1593 ms.
Overall time: 1594 ms.

测试3,4个线程，在创建循环中设置变量

Thread-10在648 ms后完成。

Thread-在648 ms之后完成12。

在648 ms之后完成Thread-13。

在648 ms之后完成Thread-11。

总时间：648 ms。

Test 3, 4 threads, setting variable in creation loop
Thread-10 completed after 648 ms.
Thread-12 completed after 648 ms.
Thread-13 completed after 648 ms.
Thread-11 completed after 648 ms.
Overall time: 648 ms.

测试4个线程，4个线程，在启动循环中设置变量

Thread-17在1353 ms后完成。

Thread-在1957 ms之后完成16次。

线程-14在2170 ms后完成。

线程-15在2169 ms后完成。

总时间：2172 ms。

Test 4, 4 threads, setting variable in start loop
Thread-17 completed after 1353 ms.
Thread-16 completed after 1957 ms.
Thread-14 completed after 2170 ms.
Thread-15 completed after 2169 ms.
Overall time: 2172 ms.

（依此类推，有时'慢'循环中的一个或两个线程按预期完成，但大部分时间没有完成。）

(and so on, sometimes one or two of the threads in the 'slow' loop finish as expected, but most times they don't).

给定的例子从理论上看，因为它没用，而且这里不需要'volatile' - 但是，如果你使用'java.util.Random' - Instance而不是'Container'-Class并且多次调用nextInt（），会发生相同的效果：如果在Thread的构造函数中创建对象，则线程将快速执行，但如果在Thread中创建它，则会很慢运行（） - 方法。我相信 Mac OS上的Java随机减速中描述的性能问题超过了一年前与这种效果有关，但我不知道它为什么会这样 - 除此之外，我确信它不应该是那样的，因为它意味着在它内部创建一个新对象总是很危险的一个线程的run-method，除非你知道在对象图中不会涉及任何volatile变量。分析没有帮助，因为在这种情况下问题消失了（与中的观察结果相同） Mac OS上的Java随机减速（续），它也不会发生在单核PC上 - 所以我猜它是一种线程同步问题......然而，奇怪的是实际上没有什么可以同步的，因为所有变量都是线程本地的。

The given example looks theoretically, as it is of no use, and 'volatile' is not needed here - however, if you'd use a 'java.util.Random'-Instance instead of the 'Container'-Class and call, for instance, nextInt() multiple times, the same effects will occur: The thread will be executed fast if you create the object in the Thread's constructor, but slow if you create it within the run()-method. I believe that the performance issues described in Java Random Slowdowns on Mac OS more than a year ago are related to this effect, but I have no idea why it is as it is - besides that I'm sure that it shouldn't be like that, as it would mean that it's always dangerous to create a new object within the run-method of a thread, unless you know that no volatile variables will get involved within the object graph. Profiling doesn't help, as the problem disappears in this case (same observation as in Java Random Slowdowns on Mac OS cont'd), and it also does not happen on a single-core-PC - so I'd guess that it's kind of a thread synchronization problem... however, the strange thing is that there's actually nothing to synchronize, as all variables are thread-local.

真的期待任何提示 - 以防你想要确认或伪造问题，请参阅下面的测试用例。

Really looking forward for any hints - and just in case you want to confirm or falsify the problem, see the test case below.

谢谢，

Stephan

public class UnexpectedPerformanceIssue {

private static class Container  {

    // Remove the volatile keyword, and the problem disappears (on windows)
    // or gets smaller (on mac os)
    private volatile long value;

    public long getValue() {
        return value;
    }

    public final void set(long newValue) {
        value = newValue;
    }
}

private static class DemoThread extends Thread {

    private Container variable;

    public void prepare() {
        this.variable = new Container();
    }

    @Override
    public void run() {
        long start = System.nanoTime();
        for(int j = 0; j < 10000000; j++) {
            variable.set(variable.getValue() + System.nanoTime());
        }
        long end = System.nanoTime();
        System.out.println(this.getName() + " completed after "
                +  ((end - start)/1000000) + " ms.");
    }
}

public static void main(String[] args) {
    System.out.println("Java Version: " + System.getProperty("java.version"));
    System.out.println("Java Class Version: " + System.getProperty("java.class.version"));

    System.out.println("VM Vendor: " + System.getProperty("java.vm.specification.vendor"));
    System.out.println("VM Version: " + System.getProperty("java.vm.version"));
    System.out.println("VM Name: " + System.getProperty("java.vm.name"));

    System.out.println("OS Name: " + System.getProperty("os.name"));
    System.out.println("OS Arch: " + System.getProperty("os.arch"));
    System.out.println("OS Version: " + System.getProperty("os.version"));
    System.out.println("Processors/Cores: " + Runtime.getRuntime().availableProcessors());

    System.out.println();
    int numberOfThreads = 4;

    System.out.println("\nReference Test (single thread):");
    DemoThread t = new DemoThread();
    t.prepare();
    t.run();

    DemoThread[] threads = new DemoThread[numberOfThreads];
    long createTime = 0, startTime = 0;
    for(int j = 0; j < 100; j++) {
        boolean prepareAfterConstructor = j % 2 == 0;
        long overallStart = System.nanoTime();
        if(prepareAfterConstructor) {
            System.out.println("\nTest " + (j+1) + ", " + numberOfThreads + " threads, setting variable in creation loop");             
        } else {
            System.out.println("\nTest " + (j+1) + ", " + numberOfThreads + " threads, setting variable in start loop");
        }

        for(int i = 0; i < threads.length; i++) {
            threads[i] = new DemoThread();
            // Either call DemoThread.prepare() here (in odd loops)...
            if(prepareAfterConstructor) threads[i].prepare();
        }

        for(int i = 0; i < threads.length; i++) {
            // or here (in even loops). Should make no difference, but does!
            if(!prepareAfterConstructor) threads[i].prepare();
            threads[i].start();
        }
        joinThreads(threads);
        long overallEnd = System.nanoTime();
        long overallTime = (overallEnd - overallStart);
        if(prepareAfterConstructor) {
            createTime += overallTime;
        } else {
            startTime += overallTime;
        }
        System.out.println("Overall time: " + (overallTime)/1000000 + " ms.");
    }
    System.out.println("Final results:");
    System.out.println(createTime/1000000 + " ms. when prepare() was called after instantiation.");
    System.out.println(startTime/1000000 + " ms. when prepare() was called before execution.");
}

private static void joinThreads(Thread[] threads) {
    for(int i = 0; i < threads.length; i++) {
        try {
            threads[i].join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

}

如果执行顺序（几乎）不受影响，如何在严重的性能下降中分配变量结果？ [英] How can assigning a variable result in a serious performance drop while the execution order is (nearly) untouched?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如果执行顺序（几乎）不受影响，如何在严重的性能下降中分配变量结果？ [英] How can assigning a variable result in a serious performance drop while the execution order is (nearly) untouched?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭