在启用了超线程的四核CPU上运行的单CPU程序 [英] Single-CPU programs running on Hyper-Threading-enabled quadcore CPU

查看:100
本文介绍了在启用了超线程的四核CPU上运行的单CPU程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是统计模式识别的研究人员,并且我经常运行模拟,运行了很多天.我正在运行带有Linux 3.2.0-24-generic的Ubuntu 12.04,据我所知,它支持多核和超线程.使用带有HTT的Intel Core i7 Sandy Bridge Quadcore,我经常同时运行4个仿真(程序需要很长时间).在我提出问题之前,这里是我已经(想知道)知道的事情.

I'm a researcher in statistical pattern recognition, and I often run simulations that run for many days. I'm running Ubuntu 12.04 with Linux 3.2.0-24-generic, which, as I understand, supports multicore and hyper-threading. With my Intel Core i7 Sandy Bridge Quadcore with HTT, I often run 4 simulations (programs that take a long time) at the same time. Before I ask my question, here are the things that I already (think I) know.

  • 我的操作系统(Ubuntu 12.04)由于超线程而检测到8个CPU.
  • 我的操作系统中的调度程序非常聪明,从不调度两个程序在属于同一物理核心的两个逻辑(虚拟)核心上运行,因为该操作系统支持SMP(同时多线程).
  • 我已经阅读了有关超线程的维基百科页面.
  • 我已经阅读了Sandy Bridge上的HowStuffWorks页面.

好的,我的问题如下.当我同时在计算机上运行4个仿真(程序)时,它们分别在单独的物理内核上运行.但是,由于超线程,每个物理核心被分为两个逻辑核心.因此,每个物理内核仅使用其全部容量的一半来运行我的每个仿真是真的吗?

OK, my question is as follows. When I run 4 simulations (programs) on my computer at the same time, they each run on a separate physical core. However, due to hyper-threading, each physical core is split into two logical cores. Therefore, is it true that each of the physical cores is only using half of its full capacity to run each of my simulations?

非常感谢您.如果我的问题的任何部分不清楚,请让我知道.

Thank you very much in advance. If any part of my question is not clear, please let me know.

推荐答案

这个答案可能很晚,但是我看到没人提供关于幕后情况的准确描述.

This answer is probably late, but I see that nobody offered an accurate description of what's going on under the hood.

要回答您的问题,不,一个线程将不使用半内核. 一个线程可以一次在内核内部工作,但是一个线程可以使整个内核的处理能力饱和.

To answer your question, no, one thread will not use half a core. One thread can work inside the core at a time, but that one thread can saturate the whole core processing power.

假定线程1和线程2属于核心#0.线程1可以饱和整个内核的处理能力,而线程2等待另一个线程结束其执行.这是序列化的执行,而不是并行执行.

Assume thread 1 and thread 2 belong to core #0. Thread 1 can saturate the whole core's processing power, while thread 2 waits for the other thread to end its execution. It's a serialized execution, not parallel.

乍一看,看起来多余的线程是没有用的.我的意思是核心可以一次处理1个线程,对吗?

At a glance, it looks like that extra thread is useless. I mean the core can process 1 thread at once right?

正确,但是在某些情况下,由于两个重要因素,内核实际上处于空闲状态:

Correct, but there are situations in which the cores are actually idling because of 2 important factors:

  • 未命中
  • 分支预测错误

缓存未命中

CPU收到任务后,会在自己的缓存中搜索需要使用的内存地址.在许多情况下,内存数据非常分散,以至于在物理上不可能将所有所需的地址范围保留在缓存中(因为缓存的容量确实有限).

When it receives a task, the CPU searches inside its own cache for the memory addresses it needs to work with. In many scenarios the memory data is so scattered that it is physically impossible to keep all the required address ranges inside the cache (since the cache does have a limited capacity).

当CPU在缓存中找不到所需的内容时,它必须访问RAM. RAM本身速度很快,但与CPU的片上高速缓存相比却显得苍白. RAM的延迟是这里的主要问题.

When the CPU doesn't find what it needs inside the cache, it has to access the RAM. The RAM itself is fast, but it pales compared to the CPU's on-die cache. The RAM's latency is the main issue here.

正在访问RAM时,内核处于停滞状态.它什么也没做.这一点并不明显,因为所有这些组件无论如何都以令人难以置信的速度工作,并且您不会通过某些CPU加载软件注意到它,但是它会叠加地叠加.一个接一个的缓存未命中,另一个则大大降低了整体性能. 这是第二个线程起作用的地方.当核心停滞等待数据时,第二个线程移入以保持核心繁忙.因此,您通常会否定核心停顿对性能的影响.

While the RAM is being accessed, the core is stalled. It's not doing anything. This is not noticeable because all these components work at a ridiculous speed anyway and you wouldn't notice it through some CPU load software, but it stacks additively. One cache miss after another and another hampers the overall performance quite noticeably. This is where the second thread comes into play. While the core is stalled waiting for data, the second thread moves in to keep the core busy. Thus, you mostly negate the performance impact of core stalls.

我之所以这么说,主要是因为如果另一个高速缓存未命中,第二个线程也会使核心停顿,但是连续2个线程而不是1个线程丢失高速缓存的可能性要低得多.

I say mostly because the second thread can also stall the core if another cache miss happens, but the likelihood of 2 threads missing the cache in a row instead of 1 thread is much lower.

分支预测错误

分支预测是指您的代码路径中包含多个可能的结果.最基本的分支代码将是if语句. 现代CPU的微代码中嵌入了分支预测算法,这些算法试图预测一段代码的执行路径.这些预测器实际上非常复杂,尽管我没有可靠的预测率数据,但我确实记得前一段时间读过一些文章,指出英特尔的Sandy Bridge架构平均成功分支预测率超过90%.

Branch prediction is when you have a code path with more than one possible result. The most basic branching code would be an if statement. Modern CPUs have branch prediction algorithms embedded into their microcode which try to predict the execution path of a piece of code. These predictors are actually quite sophisticated and although I don't have solid data on prediction rate, I do recall reading some articles a while back stating that Intel's Sandy Bridge architecture has an average successful branch prediction rate of over 90%.

当CPU命中一段分支代码时,它实际上选择一个路径(预测者认为正确的路径)并执行它.同时,核心的另一部分评估分支表达式以查看分支预测变量是否确实正确.这称为推测执行. 它的工作原理类似于2个不同的线程:一个线程计算表达式,另一个线程提前执行一条可能的路径.

When the CPU hits a piece of branching code, it practically chooses one path (path which the predictor thinks is the right one) and executes it. Meanwhile, another part of the core evaluates the branching expression to see if the branch predictor was indeed right or not. This is called speculative execution. This works similarly to 2 different threads: one evaluates the expression, and the other executes one of the possible paths in advance.

从这里开始,我们有两种可能的情况:

From here we have 2 possible scenarios:

  1. 预测变量是正确的.执行通常从确定代码路径时已经执行的推测分支继续进行.
  2. 预测变量是错误的.必须清除处理错误分支的整个管道,并从正确的分支重新开始. 或者,在解决由错误预测引起的混乱时,随时可以使用的线程可以进入并简单地执行.这是超线程的第二种用法. 平均而言,分支预测成功率很高,因此可以大大提高执行速度.但是,当预测错误时,性能确实会受到很大的损失.
  1. The predictor was correct. Execution continues normally from the speculative branch which was already being executed while the code path was being decided upon.
  2. The predictor was wrong. The entire pipeline which was processing the wrong branch has to be flushed and start over from the correct branch. OR, the readily available thread can come in and simply execute while the mess caused by the misprediction is resolved. This is the second use of hyperthreading. Branch prediction on average speeds up execution considerably since it has a very high rate of success. But performance does incur quite a penalty when the prediction is wrong.

分支预测并不是性能下降的主要因素,因为正如我所说,正确的预测率非常高. 但是缓存未命中是一个问题,在某些情况下仍将是一个问题.

Branch prediction is not a major factor of performance degradation since, like I said, the correct prediction rate is quite high. But cache misses are a problem and will continue to be a problem in certain scenarios.

根据我的经验,超线程确实对3D渲染(我的爱好)有很大帮助.我注意到,根据场景的大小和所需的材质/纹理的不同,可改善20-30%.巨大的场景使用大量的RAM,使高速缓存丢失的可能性更大.超线程有助于克服这些遗漏.

From my experience hyperthreading does help out quite a bit with 3D rendering (which I do as a hobby). I've noticed improvements of 20-30% depending on the size of the scenes and materials/textures required. Huge scenes use huge amounts of RAM making cache misses far more likely. Hyperthreading helps a lot in overcoming these misses.

这篇关于在启用了超线程的四核CPU上运行的单CPU程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆