超线程和游戏(以及其他计算应用程序)? [英] Hyper-threading and gaming (and other computing applications)?

查看:112
本文介绍了超线程和游戏(以及其他计算应用程序)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道在不同情况下超线程(每个物理核有多个逻辑核)对现实世界的性能影响是什么.英特尔宣传此方法对于执行线程正在等待I/O的情况有效,但是在内存密集型应用程序中,此方法可能无效,因为在逻辑内核之间进行切换时,处理器高速缓存中会丢失局部性.第二个应用程序的数据被加载到缓存中,从而迫使第一个应用程序的内存超出缓存.返回第一个应用程序后,其引用将全部丢失高速缓存,从而导致性能下降.我认识几个超级计算机经理,他们声称他们关闭了超线程,因为这样做在他们的情况下效率更高.在正常"的用户情况下,禁用超线程的效率更高吗?游戏可能会占用大量内存-如果不使用超线程会更好吗?

I was wondering what the real-world performance effects are of hyperthreading (multiple logical cores for each physical core) in different situations. Intel advertises this as being effective for when threads of execution are waiting for I/O, however in memory intensive applications, it can be ineffective because when a switch occurs between logical cores, locality is lost in the processor cache. The second application's data is loaded into cache, forcing the first application's memory out of cache. Upon returning to the first application, its references are all cache misses and performance is lost. I know several super computer managers and they claim that they turn off hyperthreading because doing so is more efficient in their cases. Are there "normal" user cases where disabling hyperthreading is more efficient? Gaming can be pretty memory intensive--would it be better without hyperthreading?

推荐答案

首先,应该认识到超线程是Intel的营销术语,它标记了事件上多线程(在Itanium上)和同时多线程(在x86上). SoEMT在隐藏高延迟事件(例如最后一级的高速缓存未命中)方面主要是有利的,易于实现,并且对类似VLIW的调度更友好. SoEMT还比SMT更适合小型L1(L2稍快),因为缓存争用更多地移到了L2或L3(线程交换机之间的成千上万次访问),鉴于它们的更大容量和更高的关联性,它们可以更好地处理争用. SMT可用于隐藏较小的延迟,例如分支分辨率延迟或L2缓存命中率,并提供指令级并行性,但会为资源带来更激烈的竞争.

First, it should be recognized that hyperthreading is an Intel marketing term labelling Switch-on-Event MultiThreading (on Itanium) and Simultaneous MultiThreading (on x86). SoEMT is primarily beneficial in hiding high latency events such as last level cache misses, is easier to implement, and is friendlier to VLIW-like scheduling. SoEMT is also a better fit for a small L1 (given a somewhat fast L2) than SMT since cache contention is moved more to L2 or L3 (thousands of accesses between thread switches) which can better handle contention given their greater capacity and higher associativity. SMT can be useful in hiding smaller latencies like branch resolution delay or L2 cache hits and provides instruction level parallelism, but introduces more intense contention for resources.

(禁用超线程与不使用超线程之间也有区别.禁用超线程可能会带来较小的性能优势,因为即使处于非活动状态但已启用的线程也会使用某些可共享的资源,而某些分区的资源可能仍会使用少量的资源功能,但主要好处是可以防止操作系统做出具有破坏性的调度决策.)

(There is also a difference between disabling hyperthreading and not using hyperthreading. Disabling hyperthreading might provide a small performance benefit in that some shareable resources will be used even by an inactive but enabled thread and some partitioned resources may still use a small amount of power, but the primary benefit would be in preventing the OS from making disruptive scheduling decisions.)

对于普通"代码,可用的线程级并行度可能会比可用的内核数低.在那种情况下,现代OS通常不会使用硬件多线程,因为它认识到完整的内核比由多个线程共享的内核具有更高的性能. (在特殊情况下(使用L1在线程之间进行通信特别有用),共享内核可以从理论上提高性能.此外,唤醒活动内核上的非活动线程比唤醒内核要快得多,并且所需的能量更少,因此使用多线程可能在某些特殊情况下有助于提高能源效率.)

For "normal" code, the available thread-level parallelism may well be lower than the number of cores available. In that case, a modern OS typically will not use the hardware multithreading since it recognizes that a full core has more performance than a core shared by more than one thread. (Sharing a core can theoretically improve performance in special cases where using L1 to communicate between threads is unusually helpful. In addition, waking an inactive thread on an active core is much faster and requires less energy than waking up a core, so using multithreading might be helpful for energy efficiency in some special cases.)

对于SMT,HPC代码往往是最坏的情况. HPC代码更可能对静态调度友好.这意味着SMT的延迟隐藏优势往往会最小化. (类似地,HPC代码倾向于从乱序执行中受益较少.)HPC代码还倾向于受内存带宽而不是内存延迟的约束. SMT可以增加每执行单位的带宽需求(通过增加高速缓存未命中),并通过争用内存控制器来减少实际实现的内存带宽. (DRAM对随机访问不友好;这会导致过多的刷新和行活动周期.)SMT可能还会导致活动数据流的数量超过硬件对预取的支持.假设每个内核有一个线程,则根据高速缓存大小,还可能会阻塞HPC代码.在这种情况下,SMT将产生明显的缓存抖动.

HPC codes tend to be the worst case for SMT. HPC code is more likely to be friendly to static scheduling. This means that the latency hiding benefits of SMT tend to be minimized. (Similarly, HPC code tends to benefit less from out-of-order execution.) HPC code also tends to be constrained by memory bandwidth rather than memory latency. SMT can increase the bandwidth demand per unit of execution (by increasing cache misses) and reduce the actual achieved memory bandwidth by contention at the memory controller. (DRAM is not friendly to random access; such causes excessive refresh and row active cycles.) SMT may also cause the number of data streams that are active to exceed the hardware's support for prefetching. HPC code is also more likely to be blocked according to cache sizes assuming one thread per core; in such cases SMT will produce significant cache thrashing.

禁用超线程也可能比帮派计划的操作更友好,这在HPC中很常见.如果只有一些内核正在使用多线程,那么这些内核可能每个内核具有更高的性能,但每个线程可能具有更低的性能;迫使其他内核闲置地等待速度减慢的线程完成. (HPC系统可能具有专用的OS内核和备用内核,以避免类似的问题,其中OS活动会减慢一个内核/线程的速度并迫使其他数百个内核/线程等待,或者出现故障的内核可能会导致例如16个线程的组调度程序运行15个线程,然后运行一个线程,使执行时间加倍.)

Disabling hyperthreading may also be friendlier to gang-scheduled operation, which is common in HPC. If only some of the cores are using multithreading, those cores might have higher performance per core yet would have lower performance per thread; that forces other cores to idly wait for the slowed threads to complete. (HPC systems may have dedicated OS cores and spare cores to avoid similar problems, where OS activity would slow down one core/thread and force hundreds of others to wait or where a failed core could cause, e.g., a 16-thread gang scheduled program to run 15 threads and then one thread, doubling execution time.)

(理论上,SMT可以在HPC中用于减少某些优化循环中的寄存器压力,因为双线程内核中的FMADD之类的操作的有效延迟可能被视为减少了一半.由于编译器通常将固定延迟用于调度(将SMT视为透明功能),即使可以利用此功能通常也不可行.)

(In theory, SMT could be used in HPC to reduce register pressure in some optimized loops since the effective latency of operations like FMADD in a dual threaded core may be viewed as roughly being halved. Since compilers generally use a fixed latency for scheduling [SMT is treated as a transparent feature], exploiting this feature is not generally practical even when it could be beneficial.)

与无序执行一样,SMT对于不规则代码最有利. (OoO在单个代码流中预示指令级和内存级并行性; SMT在线程中侧向"查找此类并行性.)如果分支错误预测和高速缓存未命中很常见,则SMT可以使用现有的线程级并行性来隐藏此类延迟. (分支预测错误的代价主要是解决延迟).

Rather like out-of-order execution, SMT is most beneficial for irregular code. (OoO looks ahead in a single code stream for instruction level and memory level parallelism; SMT looks "sideways" across threads for such parallelism.) If branch mispredictions and cache misses are common, SMT can use existing thread-level parallelism to hide such latencies (the cost of a branch misprediction is largely in the latency of resolution).

SMT带来的好处因工作负载和特定硬件而异.像最初的Intel Atom这样的深层流水有序微体系结构比浅层流水线的OoO微体系结构受益更多(延迟,尤其是分支解析延迟),更长的流水线和OoO提供的并行性通常会更高,而SMT线程将使用这些并行性级并行性.

The benefit from SMT varies by workload and by the specific hardware. A deeply pipelined in-order microarchitecture like the initial Intel Atom benefits more from SMT than a shallower pipelined OoO microarchitecture would (latencies, especially branch resolution latency, being generally higher with longer pipelines and OoO providing some parallelism that would otherwise be used by SMT's thread-level parallelism).

启用的超线程还可能具有增加应用程序使用的线程数的缺点,在这种情况下,随着线程数增加而进行的性能扩展足够亚线性,以至于使用超线程的每个线程的较低性能将导致性能净损失.例如,如果每个内核两个线程的超线程使每个内核的性能提高30%,而线程数量加倍,则性能提高50%,那么总性能将下降2.5%.

Enabled hyperthreading may also have the disadvantage of increasing the number of threads used by an application where performance scaling with increased thread count is sufficiently sublinear that the lower performance per thread with hyperthreading would result in a net loss of performance. E.g., if two-thread-per-core hyperthreading provided a 30% increase in per core performance and doubling thread count increased performance by 50%, then total performance would decrease by 2.5%.

有疑问时,采取措施"的标准建议显然适用.

The standard advice of "when in doubt, measure" obviously applies.

这篇关于超线程和游戏(以及其他计算应用程序)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆