并行化应用程序速度较慢 [英] Parallelized application is slower

查看:68
本文介绍了并行化应用程序速度较慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

亲爱的所有

我并行化了部分应用程序。从我的角度来看,这个问题注定要并行化:只需N个独立任务。



我没有观察到并行化,在四核上我的进程将消耗25%中央处理器。我天真的假设是如果我跑步,例如四核应该消耗大约75%的三个线程。



但遗憾的是不是这样,我的应用程序(并行化后)仍然只消耗大约25%的CPU ......还有更多......执行所有N个工作需要更多时间。我还检查了三个线程,发现每个线程处理了大约1/3的作业。



我是否需要创建真正的并行进程而不仅仅是线程? br />


环境:Borland C ++ Builder V6(:()。

注意:作业使用大量STL容器来处理作业数据。也许是问题所在?





有什么想法吗?



非常感谢你提前。

问候,Idle63

Dear all
I parallelized a part of my application. From my point of view the problem is predestined to parallelize it: Simply N independent tasks.

Not parallelized I observed, that on a quad core my process will consume 25% cpu. My naïve assumption was that if I run e.g. three threads the quad core should consume something around 75%.

But it is unfortunately not like this, my application (after parallelization) still consumes only about 25% of cpu...and more... it takes much more time to execute all N jobs. I also checked the three threads and found that each thread processed about 1/3 of the jobs.

Do I need to create real parallel processes and not only threads?

Environment: Borland C++ Builder V6 ( :( ) .
Note: Jobs use a lot of STL containers for job's data. Maybe the problem?


Any ideas?

Thank you very much in advance.
Regards, Idle63

推荐答案

一个常见的误解是创建多个线程会自动提高长时间运行任务的性能。当你让操作系统确定处理器关联时,不能保证线程并行运行。



我确定你知道单核甚至运行多个线程一次只运行一个线程。除此之外,线程的任务切换,堆栈管理和内存空间的开销,他们将实际上比在一个线程中执行任务需要更多的时间。



计算机只能运行与处理器内核并行的多个线程,并且只要您为每个线程设置关联以使用不同的核心。操作系统根据哪一个负载最少来选择核心(我认为),所以如果核心1最不忙,它可能会获得所有4个核心线程而不指定亲和力。
Its a common misconception that creating multiple threads will automatically improve the performance of a long running task. As long as you let the OS determine processor affinity there is no guarantee that the threads will run in parallel.

I'm sure that you know that a single core even running multiple threads only runs one thread at a time. Add on top of that the overhead of task switching, stack management, and memory space for the threads, they will actually take more time than just doing the task in a single thread.

A computer can only run as many threads in parallel as there are processor cores, and as long as you set the affinity for each thread to use a different core. The OS chooses the core based on which one is least loaded (I think), so if core 1 is the least busy, it may get all 4 cores threads without specifying affinity.


这篇关于并行化应用程序速度较慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆