Java线程创建开销 [英] Java thread creation overhead

查看:141
本文介绍了Java线程创建开销的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

传统智慧告诉我们,大批量企业java应用程序应优先使用线程池来生成新的工作线程。使用 java.util.concurrent 可以直截了当。

Conventional wisdom tells us that high-volume enterprise java applications should use thread pooling in preference to spawning new worker threads. The use of java.util.concurrent makes this straightforward.

然而,确实存在线程池的情况不合适。我目前正在努力的具体示例是使用 InheritableThreadLocal ,它允许 ThreadLocal 变量传递下来 对任何产生的线程。这个机制在使用线程池时会中断,因为工作线程通常不是从请求线程中生成的,而是预先存在的。

There do exist situations, however, where thread pooling is not a good fit. The specific example which I am currently wrestling with is the use of InheritableThreadLocal, which allows ThreadLocal variables to be "passed down" to any spawned threads. This mechanism breaks when using thread pools, since the worker threads are generally not spawned from the request thread, but are pre-existing.

现在有办法解决这个问题(线程本地可以显式传入),但这并不总是合适或实用。最简单的解决方案是按需生成新的工作线程,让 InheritableThreadLocal 完成它的工作。

Now there are ways around this (the thread locals can be explicitly passed in), but this isn't always appropriate or practical. The simplest solution is to spawn new worker threads on demand, and let InheritableThreadLocal do its job.

这给我们带来了回到问题 - 如果我有一个高容量站点,用户请求线程每个产生六个工作线程(即不使用线程池),这是否会给JVM带来问题?我们可能会谈论每秒创建几百个新线程,每个线程持续不到一秒钟。现代JVM是否能很好地优化这一点?我记得Java中需要对象池的日子,因为对象创建很昂贵。从此变得不必要了。我想知道是否同样适用于线程池。

This brings us back to the question - if I have a high volume site, where user request threads are spawning off half a dozen worker threads each (i.e. not using a thread pool), is this going to give the JVM a problem? We're potentially talking about a couple of hundred new threads being created every second, each one lasting less than a second. Do modern JVMs optimize this well? I remember the days when object pooling was desirable in Java, because object creation was expensive. This has since become unnecessary. I'm wondering if the same applies to thread pooling.

如果我知道要测量什么,我会对它进行基准测试,但我担心问题可能更多比分析器可以测量的微妙。

I'd benchmark it, if I knew what to measure, but my fear is that the problems may be more subtle than can be measured with a profiler.

注意:使用线程本地的智慧不是问题所在,所以请不要建议我不要使用它们。

Note: the wisdom of using thread locals is not the issue here, so please don't suggest that I not use them.

推荐答案

以下是微基准测试的示例:

Here is an example microbenchmark:

public class ThreadSpawningPerformanceTest {
static long test(final int threadCount, final int workAmountPerThread) throws InterruptedException {
    Thread[] tt = new Thread[threadCount];
    final int[] aa = new int[tt.length];
    System.out.print("Creating "+tt.length+" Thread objects... ");
    long t0 = System.nanoTime(), t00 = t0;
    for (int i = 0; i < tt.length; i++) { 
        final int j = i;
        tt[i] = new Thread() {
            public void run() {
                int k = j;
                for (int l = 0; l < workAmountPerThread; l++) {
                    k += k*k+l;
                }
                aa[j] = k;
            }
        };
    }
    System.out.println(" Done in "+(System.nanoTime()-t0)*1E-6+" ms.");
    System.out.print("Starting "+tt.length+" threads with "+workAmountPerThread+" steps of work per thread... ");
    t0 = System.nanoTime();
    for (int i = 0; i < tt.length; i++) { 
        tt[i].start();
    }
    System.out.println(" Done in "+(System.nanoTime()-t0)*1E-6+" ms.");
    System.out.print("Joining "+tt.length+" threads... ");
    t0 = System.nanoTime();
    for (int i = 0; i < tt.length; i++) { 
        tt[i].join();
    }
    System.out.println(" Done in "+(System.nanoTime()-t0)*1E-6+" ms.");
    long totalTime = System.nanoTime()-t00;
    int checkSum = 0; //display checksum in order to give the JVM no chance to optimize out the contents of the run() method and possibly even thread creation
    for (int a : aa) {
        checkSum += a;
    }
    System.out.println("Checksum: "+checkSum);
    System.out.println("Total time: "+totalTime*1E-6+" ms");
    System.out.println();
    return totalTime;
}

public static void main(String[] kr) throws InterruptedException {
    int workAmount = 100000000;
    int[] threadCount = new int[]{1, 2, 10, 100, 1000, 10000, 100000};
    int trialCount = 2;
    long[][] time = new long[threadCount.length][trialCount];
    for (int j = 0; j < trialCount; j++) {
        for (int i = 0; i < threadCount.length; i++) {
            time[i][j] = test(threadCount[i], workAmount/threadCount[i]); 
        }
    }
    System.out.print("Number of threads ");
    for (long t : threadCount) {
        System.out.print("\t"+t);
    }
    System.out.println();
    for (int j = 0; j < trialCount; j++) {
        System.out.print((j+1)+". trial time (ms)");
        for (int i = 0; i < threadCount.length; i++) {
            System.out.print("\t"+Math.round(time[i][j]*1E-6));
        }
        System.out.println();
    }
}
}

64位结果在Intel Core2 Duo E6400 @ 2.13 GHz上使用32位Sun的Java 1.6.0_21客户端VM的Windows 7如下所示:

The results on 64-bit Windows 7 with 32-bit Sun's Java 1.6.0_21 Client VM on Intel Core2 Duo E6400 @2.13 GHz are as follows:

Number of threads  1    2    10   100  1000 10000 100000
1. trial time (ms) 346  181  179  191  286  1229  11308
2. trial time (ms) 346  181  187  189  281  1224  10651

结论:由于我的计算机有两个内核,因此两个线程的工作速度几乎是一个线程的两倍。我的电脑每秒可以生成近10000个线程,i。即线程创建开销为0.1毫秒。因此,在这样的机器上,每秒几百个新线程构成可忽略的开销(通过比较2和100个线程的列中的数字也可以看出)。

Conclusions: Two threads do the work almost twice as fast as one, as expected since my computer has two cores. My computer can spawn nearly 10000 threads per second, i. e. thread creation overhead is 0.1 milliseconds. Hence, on such a machine, a couple of hundred new threads per second pose a negligible overhead (as can also be seen by comparing the numbers in the columns for 2 and 100 threads).

这篇关于Java线程创建开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆