哪个更快?更少的工作在更多的运行,或更多的工作在更少的runnables? (ExecutorService) [英] Which is faster? Less work in more runnables, or more work in less runnables? (ExecutorService)

查看:181
本文介绍了哪个更快?更少的工作在更多的运行,或更多的工作在更少的runnables? (ExecutorService)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何从多线程应用程序获得最大的性能。

我有一个线程池,我创建这样:

  ExecutorService executor = Executors.newFixedThreadPool(8); //我有8个CPU内核。 

我的问题是,如果我把工作分成只有8个runnables / callables,作为线程池中的线程,或者应该把它分成说1000000个runnables / callables?

  for(int i = 0 ; i <1000000; i ++)
{
Callable< Long> worker = new MyCallable(); //每个工人做一点工作。
Future< Long> submit = executor.submit(worker);
}

long sum = 0;

for(Future< Long> future:list)
sum + = future.get(); //从for循环开始的更多开销

  for(int i = 0; i <8; i ++)
{
Callable< Long> worker = new MyCallable(); //每个工人做更多的工作。
Future< Long> submit = executor.submit(worker);
}

long sum = 0;

for(Future< Long> future:list)
sum + = future.get(); //从for循环的可忽略的开销

分成1000000可调用对我来说似乎比较慢,因为有开销实例化所有这些可callable并从for循环中收集它们的结果。另一方面如果我有8个callables,这个开销可以忽略不计。因为我只有8个线程,我不能同时运行1000000可调用,所以没有从那里的性能增益。



我是对还是错? / p>

BTW我可以测试这些情况,但操作是非常微不足道的,我猜编译器意识到,并做一些优化。所以结果可能会误导。我想知道哪种方法对于图像处理应用程序更好。

解决方案

这个问题有两个方面。



首先你有技术的Java东西。由于您有几个答案,我将总结这些基础:




  • 如果您有N个核心,则N个线程每个线程应该会给你最好的结果,只要每个任务只有CPU绑定(即没有I / O涉及)

  • 做更多的工作比任务所需的,即有N个线程计数到10会慢得多,因为创建和管理额外的线程的开销高于优点

    c> c> c> 调用同步增量方法会慢得多。

  • 线程资源,最常见的是内存。你拥有的线程越多,估计你的内存使用就越困难,并且可能影响GC的时序(很少见,但我已经看到它发生了)。



其次,你有调度理论。




  • 通常使用 Threads / O操作。你不希望程序等待网络或硬盘驱动器,如果你可以使用你的CPU的其他任务

  • 有一些好的书籍计划(不记得名称)可以帮助您设计高效的程序。在你提到的例子中,可能有额外的线程有意义的情况。例如如果您的任务没有确定性持续时间,则倾斜,并且您的平均响应时间很重要:假设您有2个核心和4个任务。任务A& B将需要1分钟,但C& D将需要10分钟。如果你运行这两个线程与C& D执行首先,你的总时间将是11分钟,但你的平均响应时间将是(10 + 10 + 11 + 11)/4=10.5分钟。如果你对4个线程执行,你的响应时间将是((1 + a)+(1 + a)+(10 + a)+(10 + a))/ 4 = 5.5 + a,其中 a 是调度等待时间近似。这是非常理论的,因为有很多变量没有解释,但可以帮助设计线程程序。 (同样在上面的例子中,由于你正在等待 Futures ,你很可能不在乎平均响应时间)

  • 使用多个线程池时必须小心。使用多个池可能会导致死锁(如果在两个池之间引入依赖关系),并且难以优化(可以在池中创建争用并获得正确的大小)



- EDIT -



最后,如果有帮助,我对性能的看法是,资源:CPU,RAM,磁盘&网络。我试图找到哪个是我的瓶颈,并使用非饱和资源进行优化。例如,如果我有大量的空闲CPU和低内存,我可能压缩我的内存中的数据。如果我有很多磁盘I / O和大内存,缓存更多的数据。如果网络资源(不是实际的网络连接)很慢,使用许多线程来并行化。一旦您在关键路径上使资源类型饱和,并且无法使用其他资源来加速资源类型,则您已达到最高性能,您需要升级H / W以获得更快的结果。


I'm trying to figure out how I can get the maximum performance from a multithreaded app.
I have a thread pool which I created like this:

ExecutorService executor = Executors.newFixedThreadPool(8); // I have 8 CPU cores.  

My question is, should I divide the work into only 8 runnables/callables, which is the same number as the threads in the thread pool, or should I divide it into say 1000000 runnables/callables?

for (int i = 0; i < 1000000; i++) 
{
    Callable<Long> worker = new MyCallable();  // Each worker does little work.
    Future<Long> submit = executor.submit(worker);
}

long sum = 0;

for (Future<Long> future : list) 
    sum += future.get();  // Much more overhead from the for loops

OR

for (int i = 0; i < 8; i++) 
{
    Callable<Long> worker = new MyCallable();  // Each worker does much more work.
    Future<Long> submit = executor.submit(worker);
}

long sum = 0;

for (Future<Long> future : list) 
    sum += future.get();  // Negligible overhead from the for loops

Dividing into 1000000 callables seems slower to me since there is the overhead of instantiating all these callables and collecting results from them in for loops. On the other hand If I have 8 callables this overhead is negligible. And since I have only 8 threads, I can't run 1000000 callables at the same time so there is no performance gain from there.

Am I right or wrong?

BTW I could test these cases but the operation is very trivial and I guess the compiler realizes that and makes some optimizations. So the result might be misleading. I want to know which approach is better for something like an image processing app.

解决方案

There are two aspects to this question.

First you have the technical Java stuff. As you have a few answers about this, I 'll summarize to these basics:

  • if you have N Cores, then N number of threads would give you the best results as long as each task is only CPU bound (i.e. no I/O involved)
  • each Thread should do more work than what is required for the task, i.e. Having N Threads counting to 10 would be much slower as the overhead of creating and managing the extra Threads is higher than the benefit of counting to 10 in parallel
  • you need to make sure that any synchronization overhead is lower than the work being done i.e. Having N Threads calling a synchronized increment methods would be much slower
  • Threads do take up resources, most commonly memory. The more threads you have, the more difficult it becomes to estimate you memory usage and might affect GC timing (rare but I've seen it happen)

Secondly you have the scheduling theory. You need to consider what is your program doing

  • Typically use Threads for blocking I/O operations. You don't want you program to wait for network or HDD if you could be using your CPU for other tasks
  • There are a few good books on scheduling (can't remember the names) that can help you design efficient programs. In the example you mention, there might be cases that extra threads would make sense. e.g. If your tasks don't have a deterministic duration, are skewed and your average response time is important: Assume you have 2 core and 4 tasks. Task A & B will take 1 minute each but C & D will take 10 minutes. If you run run these against 2 threads with C & D executing first, your total time will be 11 minutes but your average response time will be (10+10+11+11)/4=10.5 minutes. If you execute against 4 Threads then your the response time will be ((1+a)+(1+a)+(10+a)+(10+a))/4=5.5+a, where a is the scheduling waiting time approximation. This is very theoretical because there are many variables not explained, but can help in designing threaded programs. (Also in the example above, since you are waiting on the Futures you most likely don't care about average response times)
  • Care must be taken when using multiple Thread pools. Using multiple pools can cause deadlocks (if dependencies are introduced among the two pools) and make it hard to optimize (contention can be created among the pools and getting the sizes right might become impossible)

--EDIT--

Finally, if it helps, the way I think about performance is that I have 4 primary resources: CPU, RAM, Disk & Network. I try to find which is my bottleneck and use non-saturated resources to optimize. For example, if I have lots of idle CPU and low memory, I might compress my in-memory data. If I have lots of disk I/O and large memory, cache more data. If network resources (not the actual network connection) are slow use many threads to parallelize. Once you saturate a resource type on your critical path and can't use other resources to speed it up, you've reached your maximum performance and you need to upgrade your H/W to get faster results.

这篇关于哪个更快?更少的工作在更多的运行,或更多的工作在更少的runnables? (ExecutorService)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆