并发程序的性能随着线程的增加而下降吗? [英] Performance of Concurrent Program Degrading with Increase in Threads?

查看:72
本文介绍了并发程序的性能随着线程的增加而下降吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试在四核计算机上实现以下代码,并且Executor服务中的线程数超过100次迭代的平均运行时间如下所示

1个线程= 78404.95

2个线程= 174995.14

4个线程= 144230.23

但是根据我所研究的,线程2*(no of cores)应该为该程序提供最佳结果,这显然不是我的程序中那种奇怪地为单线程提供最佳时间的情况.

代码:

  import java.util.Collections;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class TestHashSet {

    public static void main(String argv[]){
        Set<Integer> S = Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());
        S.add(1);
        S.add(2);
        S.add(3);
        S.add(4);
        S.add(5);
        long  startTime = System.nanoTime();
        ExecutorService executor = Executors.newFixedThreadPool(8);
        int Nb = 0;
        for(int i = 0;i<10;i++){
            User runnable = new User(S);
            executor.execute(runnable);

            Nb = Thread.getAllStackTraces().keySet().size();
        }
        executor.shutdown();
        try {
            executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        long endTime = System.nanoTime();
        System.out.println(0.001*(endTime-startTime)+" And "+Nb);
    }
}
class User implements Runnable{
    Set<Integer> S;
    User(Set<Integer> S){
        this.S = S;
    }
    @Override
    public void run() {
        // TODO Auto-generated method stub
        Set<Integer> t =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
        for(int i = 0;i<10;i++){
            t.add(i+5);
        }
        S.retainAll(t);
        Set<Integer> t2 =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
        for(int i = 0;i<10;i++){
            t2.add(i);
        }
        S.addAll(t);
        /*
        ConcurrentHashSet<Integer> D = new ConcurrentHashSet<Integer>();
        for(int i=0;i<10;i++){
            D.add(i+3);
        }
        S.difference(D);
        */
    }
}

更新:如果我将每个线程的查询数量增加到1000个,则4线程的性能要优于单线程.我认为当我每个线程仅使用约4个查询且没有查询时,开销比运行时要高.现在,增加的运行时大于Overhead.谢谢

解决方案

但是应该增加5个线程来提高性能..

这就是>> you<<认为.但是实际上,不能保证添加线程会提高性能.

但是根据我研究的结果,2 *(无核)线程应该给出最佳结果...

如果您在某处阅读过该书,则可能是您误读了它,或者是完全错误的

现实情况是,获得最佳性能的线程数量高度依赖,取决于应用程序的性质以及运行的硬件.


基于对代码的粗略阅读,看来这是测试Java处理多线程访问和对共享集(S)更新的性能的基准.每个线程都在线程受限集中执行某些操作,然后将线程受限集中的所有条目添加或删除到共享集中.

问题在于addAllretainAll调用可能是并发瓶颈.与基于HashMap的集合相比,基于ConcurrentHashMap的集合将为点访问/更新该集合提供更好的并发性能.但是,addAll和retainAll在其他线程正在操作的相同条目上执行N个此类操作.考虑到这种操作模式的性质,您可能会在ConcurrentHashMap的不同区域内引起重大竞争.这很可能导致一个线程阻塞另一线程……并减慢速度.

更新:如果我增加每个线程的查询数量,则4线程的性能要优于单线程.我认为当我每个线程仅使用约4个查询并且由于没有查询增加时,开销比运行时要高.现在大于开销.

我假设您的意思是您正在增加哈希映射条目的数量.考虑到ConcurrentHashMap的工作方式,这可能会减少平均竞争. (该类将地图划分为多个区域,并安排涉及不同区域中的条目的操作产生的可能的争用开销最小.通过增加不同条目的数量,可以减少两个同时进行的操作导致争用的可能性.)


所以回到"2 x线程数"的事实.

我怀疑您一直在阅读的资料并没有说能为您提供最佳性能.我怀疑他们真的是这么说的:

  • "2 x无线程"是一个很好的起点... 您需要针对您的应用程序/问题/硬件和/或

    <进行调整 /li>
  • 对于计算密集型任务,不要超过"2 x线程数量" ...因为它不太可能帮助.

在您的示例中,争用的主要来源很可能是对共享集/映射的更新……以及确保它们原子发生的开销.

您也可以在较低级别上争用;即争用内存带宽(RAM读/写)和内存缓存争用.是否发生这种情况取决于您所运行的硬件的规格...


最后要注意的是,基准测试存在缺陷,因为它不允许各种虚拟机预热效果……例如JIT编译.您的2个线程时间比1个线程时间长两倍以上的事实表明了该问题.

关于基准测试,还有其他可疑的方面:

  • 通过run()方法完成的工作量太小.

  • 此基准似乎不能代表真实的用例.在完全虚拟的(废话)算法中测量加速不会为您提供线索,在您扩展线程数时,真正的算法可能会如何执行.

  • 在4核计算机上运行测试意味着您可能没有足够的数据点来得出具有科学意义的结论……假设基准是正确的.


话虽如此,您似乎看到的2到4线程速度下降并不是我想不到的.

I have been trying to implement the below code on quad core computer and average running times with No of threads in the Executor service over 100 iterations is as follows

1 thread = 78404.95

2 threads = 174995.14

4 thread = 144230.23

But according to what I have studied 2*(no of cores) of threads should give optimal result for the program which is clearly not the case in my program which bizarrely gives best time for single thread.

Code :

  import java.util.Collections;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class TestHashSet {

    public static void main(String argv[]){
        Set<Integer> S = Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());
        S.add(1);
        S.add(2);
        S.add(3);
        S.add(4);
        S.add(5);
        long  startTime = System.nanoTime();
        ExecutorService executor = Executors.newFixedThreadPool(8);
        int Nb = 0;
        for(int i = 0;i<10;i++){
            User runnable = new User(S);
            executor.execute(runnable);

            Nb = Thread.getAllStackTraces().keySet().size();
        }
        executor.shutdown();
        try {
            executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        long endTime = System.nanoTime();
        System.out.println(0.001*(endTime-startTime)+" And "+Nb);
    }
}
class User implements Runnable{
    Set<Integer> S;
    User(Set<Integer> S){
        this.S = S;
    }
    @Override
    public void run() {
        // TODO Auto-generated method stub
        Set<Integer> t =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
        for(int i = 0;i<10;i++){
            t.add(i+5);
        }
        S.retainAll(t);
        Set<Integer> t2 =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
        for(int i = 0;i<10;i++){
            t2.add(i);
        }
        S.addAll(t);
        /*
        ConcurrentHashSet<Integer> D = new ConcurrentHashSet<Integer>();
        for(int i=0;i<10;i++){
            D.add(i+3);
        }
        S.difference(D);
        */
    }
}

Update : If I increase no of queries per thread to 1000 , 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.Thanks

解决方案

But 5 Threads Supposed to increase the performance..?

That's what >>you<< suppose. But in fact, there are no guarantees that adding threads will increase performance.

But according to what I have studied 2*(no of cores) of threads should give optimal result ...

If you read that somewhere, then you either misread it or it is plain wrong.

The reality is that the number of threads for optimal performance is highly dependent on the nature of your application, and also on the hardware you are running on.


Based on a cursory reading of your code, it appears that this is a benchmark to test how well Java deals with multi-threaded access and updates to a shared set (S). Each thread is doing some operations on a thread-confined set, then either adding or removing all entries in the thread-confined set to the shared set.

The problem is that the addAll and retainAll calls are likely to be concurrency bottlenecks. A set based on ConcurrentHashMap will give better concurrent performance for point access / update to the set than on based on HashMap. However, addAll and retainAll perform N such operations, on the same entries that the other threads are operating on. Given the nature of this pattern of operations, you are likely to get significant contention within the different regions of the ConcurrentHashMap. That is likely to lead to one thread blocking another ... and a slowdown.

Update : If I increase no of queries per thread 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.

I assume that you mean that you are increasing the number of hash map entries. This is likely to reduce the average contention, given the way that ConcurrentHashMap works. (The class divides the map into regions, and arranges that operations involving entries in different regions incur the minimum possible contention overheads. By increasing the number of distinct entries, you are reducing the probability that two simultaneous operations will lead to contention.)


So returning to the "2 x no of threads" factoid.

I suspect that the sources you have been reading don't actually say that that gives you optimal performance. I suspect that they really say that that:

  • "2 x no of threads" is a good starting point ... and you need to tune it for your application / problem / hardware, and/or

  • don't go above "2 x no of threads" for a compute intensive task ... because it is unlikely to help.

In your example, it is most likely that the main source of the contention is in the updates to the shared set / map ... and the overheads of ensuring that they happen atomically.

You can also get contention at a lower level; i.e. contention for memory bandwidth (RAM read/write) and memory cache contention. Whether that happens will depend on the specs of the hardware you are running on ...


The final thing to note is that your benchmark is flawed in that it does not allow for various VM warmup effects ... such as JIT compilation. The fact that your 2 thread times are more than double the 1 thread times points to that issue.

There are other questionable aspects about your benchmarking:

  • The amount of work done by the run() method is too small.

  • This benchmark does not appear to be representative of a real-world use-case. Measuring speed-up in a totally fictitious (nonsense) algorithm is not going to give you any clues about how a real algorithm is likely to perform when you scale the thread count.

  • Running the tests on a 4 core machine means that you probably wouldn't have enough data points to draw scientifically meaningful conclusions ... assuming that the benchmark was sound.


Having said that, the 2 to 4 thread slowdown that you seem to be seeing is not unexpected ... to me.

这篇关于并发程序的性能随着线程的增加而下降吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆