并行化应用程序速度较慢 [英] Parallelized application is slower
问题描述
亲爱的所有
我并行化了部分应用程序。从我的角度来看,这个问题注定要并行化:只需N个独立任务。
我没有观察到并行化,在四核上我的进程将消耗25%中央处理器。我天真的假设是如果我跑步,例如四核应该消耗大约75%的三个线程。
但遗憾的是不是这样,我的应用程序(并行化后)仍然只消耗大约25%的CPU ......还有更多......执行所有N个工作需要更多时间。我还检查了三个线程,发现每个线程处理了大约1/3的作业。
我是否需要创建真正的并行进程而不仅仅是线程? br />
环境:Borland C ++ Builder V6(:()。
注意:作业使用大量STL容器来处理作业数据。也许是问题所在?
有什么想法吗?
非常感谢你提前。
问候,Idle63
Dear all
I parallelized a part of my application. From my point of view the problem is predestined to parallelize it: Simply N independent tasks.
Not parallelized I observed, that on a quad core my process will consume 25% cpu. My naïve assumption was that if I run e.g. three threads the quad core should consume something around 75%.
But it is unfortunately not like this, my application (after parallelization) still consumes only about 25% of cpu...and more... it takes much more time to execute all N jobs. I also checked the three threads and found that each thread processed about 1/3 of the jobs.
Do I need to create real parallel processes and not only threads?
Environment: Borland C++ Builder V6 ( :( ) .
Note: Jobs use a lot of STL containers for job's data. Maybe the problem?
Any ideas?
Thank you very much in advance.
Regards, Idle63
推荐答案
一个常见的误解是创建多个线程会自动提高长时间运行任务的性能。当你让操作系统确定处理器关联时,不能保证线程并行运行。
我确定你知道单核甚至运行多个线程一次只运行一个线程。除此之外,线程的任务切换,堆栈管理和内存空间的开销,他们将实际上比在一个线程中执行任务需要更多的时间。
计算机只能运行与处理器内核并行的多个线程,并且只要您为每个线程设置关联以使用不同的核心。操作系统根据哪一个负载最少来选择核心(我认为),所以如果核心1最不忙,它可能会获得所有4个核心线程而不指定亲和力。
Its a common misconception that creating multiple threads will automatically improve the performance of a long running task. As long as you let the OS determine processor affinity there is no guarantee that the threads will run in parallel.
I'm sure that you know that a single core even running multiple threads only runs one thread at a time. Add on top of that the overhead of task switching, stack management, and memory space for the threads, they will actually take more time than just doing the task in a single thread.
A computer can only run as many threads in parallel as there are processor cores, and as long as you set the affinity for each thread to use a different core. The OS chooses the core based on which one is least loaded (I think), so if core 1 is the least busy, it may get all 4coresthreads without specifying affinity.
这篇关于并行化应用程序速度较慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!