4 个内核的 8 个逻辑线程并行运行速度最多可提高 4 倍? [英] 8 logical threads at 4 cores will at a maximum run 4 times faster in parallel?

查看:242
本文介绍了4 个内核的 8 个逻辑线程并行运行速度最多可提高 4 倍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是基准测试软件,它在英特尔 2670QM 上的执行速度比我的串行版本快 4 倍,使用我的所有 8 个逻辑"线程.我想要一些关于我对基准测试结果的看法的社区反馈.

当我在 4 个内核上使用 4 个线程时,速度提高了 4 倍,整个算法并行执行.这对我来说似乎合乎逻辑,因为Amdhals 定律"预测了这一点.Windows 任务管理器告诉我我使用了 50% 的 CPU.

When I am using 4 Threads on 4 cores I get a speed up of 4x, the entire algorithm is executed in parallell. This seems logical to me since 'Amdhals law' predicts it. Windows task manager tells me I'm using 50% of the CPU.

但是,如果我在所有 8 个线程上执行相同的软件,我将再次获得 4 倍的加速而不是 8 倍的加速.

However if I execute the same software on all 8 threads, I get, once again a speed up of 4x and not a speed up of 8x.

如果我理解正确的话:我的 CPU 有 4 个内核,频率分别为 2.2GHZ,但是当应用于 8 个逻辑"线程时,频率被分成 1.1GHZ,其余组件的频率相同,例如高速缓存?如果这是真的,那么为什么任务管理器声称我的 CPU 只使用了 50%?

If I have understood this correctly: my CPU has 4 cores with a Frequency of 2.2GHZ individually but the Frequency is divided into 1.1GHZ when applied to 8 'logical' threads and the same follows for the rest of the component such as the cache memory? If this is true then why does the task manager claim only 50% of my CPU is being used?

#define NumberOfFiles 8
...
char startLetter ='a';
#pragma omp parallel for shared(startLetter)
for(int f=0; f<NumberOfFiles; f++){
    ...
}

我不包括使用磁盘 I/O 的时间.我只对 STL 调用花费的时间(STL 排序)感兴趣,而不是磁盘 I/O.

I am not including the time using disk I/O. I am only interested in the time a STL call takes(STL sort) not the disk I/O.

推荐答案

A i7-2670QM处理器有 4 个内核.但它可以并行运行 8 个线程.这意味着它只有 4 个处理单元(核心),但在硬件上支持并行运行 8 个线程.这意味着在内核上最多运行四个作业,如果其中一个作业由于例如内存访问而停止,另一个线程可以非常快速地在空闲内核上开始执行,而损失很小.阅读有关超线程的更多信息.在现实中,很少有超线程带来巨大性能提升的场景.更现代的处理器比旧处理器更好地处理超线程.

您的基准测试表明它受 CPU 限制,即管道中几乎没有停顿会给超线程带来优势.50% 的 CPU 是正确的,4 个内核正在工作,而 4 个额外的内核没有做任何事情.在 BIOS 中打开超线程,您将看到 100% CPU.

A i7-2670QM processor has 4 cores. But it can run 8 threads in parallel. This means that it only has 4 processing units (Cores) but has support in hardware to run 8 threads in parallel. This means that a maximum of four jobs run in on the Cores, if one of the jobs stall due to for example memory access another thread can very fast start executing on the free Core with very little penalty. Read more on Hyper threading. In Reality there are few scenarios where hyper threading gives a large performance gain. More modern processors handle hyper threading better than older processors.

Your benchmark showed that it was CPU bound, i.e. There was little stalls in the pipeline that would have given Hyper Threading an advantage. 50% CPU is correct has the 4 cores are working and the 4 extra are not doing anything. Turn of hyper threading in the BIOS and you will see 100% CPU.

这篇关于4 个内核的 8 个逻辑线程并行运行速度最多可提高 4 倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆