在具有4个双核Cpu的服务器上使用1-8线程的程序的时间安排是否有问题? [英] Problem with the timmings of a program that uses 1-8 threads on a server that has 4 Dual Core Cpu's?

查看:50
本文介绍了在具有4个双核Cpu的服务器上使用1-8线程的程序的时间安排是否有问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,

我在我的大学的服务器上运行一个程序,该程序具有4个双核AMD Opteron(tm)处理器2210 HE和操作系统.是Linux版本2.6.27.25-78.2.56.fc9.x86_64.我的程序实现了Conways Life of Life,并使用pthreads和openmp运行.我使用getimeofday()函数(使用1-8个线程)对程序的parraller部分进行计时.但是,时机似乎并不正确.我得到最大的时间使用1个线程(如预期的那样),然后时间变得更短.但是我得到的最短时间是当我使用4个线程时.

这是我使用1000x1000数组时的示例.

使用1个线程〜9,62秒,使用2个线程〜4,73秒,使用3〜3.64秒,使用4〜2.99秒,使用5〜4,19秒,使用6〜3.84,使用7〜3.34,使用8 〜3.12.

以上的时间是当我使用pthreads时.当我使用openmp时,时间变小了,但遵循相同的模式.

我预计由于4个双核cpus,时间将从1-8减少?我以为,因为有4个cpus和2个内核,所以可以同时运行8个线程.它与服务器运行的操作系统有关吗?

我还在装有7个双核AMD Opteron(tm)处理器8214并运行Linux版本2.6.18-194.3.1.el5的另一台服务器上测试了相同的程序.我在那里得到的时光正是我所期望的.时间从1(最大)到8(最小执行时间)开始变小.

该程序使用pthreads和openmp来实现正确的生命游戏",我只是想不出为什么时机就像我发布的示例.因此,总而言之,我的问题是:

1)可以在一个系统上同时运行的线程数取决于cpus的内核?仅取决于cpus尽管每个cpu都有多个内核?它取决于所有以前的操作系统吗?

2)它与我将1000x1000数组除以线程数的方式有关吗?但是,如果我这样做了,那么openmp代码将不会给出相同的时序模式?

3)我可能会受到这种刺激的原因是什么?

请问我的英语,我来自欧洲... thnx in advanse.

Hello,

I am runnig a program on a server at my university that has 4 Dual-Core AMD Opteron(tm) Processor 2210 HE and th O.S. is Linux version 2.6.27.25-78.2.56.fc9.x86_64. My program implements Conways Game of Life and it runs using pthreads and openmp. I timed the parraller part of the program using the getimeofday() function using 1-8 threads. But the timmings dont seem right. I get the biggest time using 1 thread(as expected), then the time gets smaller. But the smallest time i get is when i use 4 threads.

Here is an example when i use an array 1000x1000.

Using 1 thread~9,62 sec, Using 2 Threads~4,73 sec, Using 3 ~ 3.64 sec, Using 4~2.99 sec, Using 5 ~4,19 sec, Using 6~3.84, Using 7~3.34, Using 8~3.12.

The above timmings are when i use pthreads. When i use openmp the timming are smaller but follow the same pattern.

I expected that the time would decrease from 1-8 because of the 4 Dual core cpus? I thought that because there are 4 cpus with 2 cores each, 8 threads could run at the same time. Does it have to do with the operating system that the server runs?

Also i tested the same programs on another server that has 7 Dual-Core AMD Opteron(tm) Processor 8214 and runs Linux version 2.6.18-194.3.1.el5. There the timmings i get are what i expected. The timmings get smaller starting from 1(the biggest) to 8(smallest excecution time).

The program implements the Game of Life correct, both using pthreads and openmp, i just cant figure out why the timmings are like the example i posted. So in conclusion, my questions are:

1) The number of threads that can run at the same time on a system depends by the cores of the cpus?it depends only by the cpus althgough each cpu has more than one cores? It depends by all the previous and the Operating System?

2) Does it have to do with the way i divide the 1000x1000 array to the number of threads? But if i did then the openmp code wouldnt give the same pattern of timmings?

3)What is the reason i might get such timmings?

excuse my english i am from europe... thnx in advanse.

推荐答案

1-在4核系统上,可以有效运行的线程数实际上为4.但是,如果正确实现,则可以给另一个线程提供备用周期,以消除等待.例如,等待网络消息.

2-它不仅取决于cpu的数量.具有多个核心的1个物理CPU仍必须在核心之间共享资源.在已移至所有cpu的共享缓存的内存上工作时,这也可能是一个好处.它也可能取决于操作系统,但是目前大多数人都存在在内核之间切换线程的问题,因此会减慢所有操作的速度.如果查看在图形折线图中执行的线程,则每隔几秒钟就会看到一条线在核心之间移动. (还取决于您自己的实现)

3-对所有线程/核心使用1000x1000数组时,需要进行大量同步.其中一些是由cpu硬件本身完成的.例如,您将数据移到与另一个CPU共享但由于驻留在相同高速缓存行大小中而共享的一个核心中.因此,cpu-1需要来自地址8的一个字节,而cpu-2需要来自地址16的一个字节.现在,当高速缓存行大小为64时,它们存在于同一内存高速缓存行中.这就是所谓的错误锁定.这在很大程度上也适用于您使用的锁定对象.这是一个很难在这里的答案中解释的东西,但是有相关信息.要记住的最重要的一点是,如果每个线程始终依赖于另一个线程可能已锁定的数据,那么多处理器执行将无济于事.如果该线程正在等待,则它将强制从属线程也等待.您可以将其与一次只能容纳8人的单个水槽中的菜肴进行比较.那时没有更多的人申请,迫使所有人放慢脚步而不是加快流程.

好吧,当涉及共享资源时,请尝试更多地分离任务,并使每个线程的资源利用效率更高.请记住在线程之间切换以及存储和加载其附带资源的开销.

祝你好运!
1 - The number of threads that can effectively run are indeed 4 on a 4 core system. But when implemented correctly you could give spare cycles to another thread to eliminate waits. For example waiting for a network message.

2 - It depends on more than simply the number of cpu''s. 1 physical cpu with multiple cores still has to share resources between the cores. This could also be a benefit when working on memory already moved to the shared cache of all cpu''s. It could also depend on operating system but for now most have the problem to switch threads between cores and therefore slowing everything down. If you look at the thread executing in a graphical line chart you would see that they go from core to core when watching the lines cross every few seconds. (also depending on your own implementation)

3 - When you are using a 1000x1000 array for all the threads/cores there is a lot of synchronizing needed. Some of this is done by the cpu hardware itself. You for example move data into one core that is shared with another cpu but is shared because resident in the same cache line size. So cpu-1 needs a byte from address 8 and cpu-2 needs a byte from address 16. Now when the cache line size is 64 they exists in the same memory cache line. This is something called false locking. Which also applies in a large deal to the locking objects you use. Is is something that can''t easily be explained in an answer here but there is info on this. The most important thing to remember is that multi processor execution won''t help if each thread constantly is dependent on data that another thread may have locked. If that thread is then waiting it forces the dependant thread to wait as well. You could compare it somewhat to doing dishes in a single sink with 8 people at a time. The more the merrier is not applying at that point, forcing everyone to slow down instead of speeding up the process.

Well, try separating the tasks more and make each thread be more resource efficient when it comes specially to shared resources. Keep in mind the overhead of switching between threads and storing and loading the resources that comes with it.

Good luck!


看看这篇文章:并发危害:错误共享 [ ^ ]

它清楚地显示了在具有多CPU的系统上运行多线程应用程序时会发生什么情况
Have a look to this article: Concurrency Hazards: False Sharing[^]

It give a clear view of what can happen when running multi-threaded application on systems with multiple-CPU


这篇关于在具有4个双核Cpu的服务器上使用1-8线程的程序的时间安排是否有问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆