多CPU,多核和超线程 [英] multi-CPU, multi-core and hyper-thread

查看:108
本文介绍了多CPU,多核和超线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我推荐一些文档来说明多CPU,多核和超线程之间的区别吗?我总是对这些差异以及每种架构在不同情况下的优缺点感到困惑.

Could anyone recommend some documents to me to illustrate the differences between multi-CPU, multi-core, and hyper-thread? I am always confused about these differences, and about the pros/cons of each architecture in different scenarios.

这是我在网上学习并从他人的评论中学习后的当前理解;有人可以评论一下吗?

here is my current understanding after learning online and learning from others' comments; could anyone review comment please?

  1. 我认为超线程是其中最差的技术,但价格便宜.其主要思想是重复寄存器,以节省上下文切换时间;
  2. 多处理器比超线程更好,但是由于不同的CPU在不同的芯片上,因此不同CPU之间的通信比多核具有更长的延迟,并且使用多个芯片比使用多芯片具有更多的开销和更多的功耗.多核;
  3. 多核将所有CPU集成在单个芯片上,因此与多处理器相比,不同CPU之间的通信延迟大大降低了.由于它使用一个芯片来包含所有CPU,因此与多处理器系统相比,它消耗的功率更少,成本也更低.

先谢谢了, 乔治

推荐答案

第一个版本是多CPU:您将有一个或多个主板,上面装有一个或多个CPU芯片.这里的主要问题是,CPU必须将一些内部数据公开给另一个CPU,这样才不会妨碍他们.

Multi-CPU was the first version: You'd have one or more mainboards with one or more CPU chips on them. The main problem here was that the CPUs would have to expose some of their internal data to the other CPU so they wouldn't get in their way.

下一步是超线程.主板上有一个芯片,但内部有一些部件两次,因此可以同时执行两条指令.

The next step was hyper-threading. One chip on the mainboard but it had some parts twice internally so it could execute two instructions at the same time.

当前的发展是多核的.这基本上是最初的想法(几个完整的CPU),但在单个芯片中.优点:芯片设计人员可以轻松地将用于同步信号的其他导线放入芯片中(而不是必须将它们通过针脚引出,然后在拥挤的主板上布线,然后进入第二个芯片).

The current development is multi-core. It's basically the original idea (several complete CPUs) but in a single chip. The advantage: Chip designers can easily put the additional wires for the sync signals into the chip (instead of having to route them out on a pin, then over the crowded mainboard and up into a second chip).

当今的超级计算机是多CPU,多核的:它们有很多主板,板上通常装有2-4个CPU,每个CPU是多核的,每个都有自己的RAM.

Super computers today are multi-cpu, multi-core: They have lots of mainboards with usually 2-4 CPUs on them, each CPU is multi-core and each has its own RAM.

您完全正确.只是一些小要点:

You got that pretty much right. Just a few minor points:

  • 超线程在单个内核中一次跟踪两个上下文,从而为乱序的CPU内核提供了更多的并行性.即使一个线程因高速缓存未命中,分支错误预测或等待高延迟指令的结果而停滞,这也使执行单元保持工作量.这是一种无需复制大量硬件即可获得更高总吞吐量的方法,但是如果有的话,它会分别降低每个线程的速度. 有关详细信息,请参阅此问题与解答 ,并解释本段以前的措词有什么问题.

  • Hyper-threading keeps track of two contexts at once in a single core, exposing more parallelism to the out-of-order CPU core. This keeps the execution units fed with work, even when one thread is stalled on a cache miss, branch mispredict, or waiting for results from high-latency instructions. It's a way to get more total throughput without replicating much hardware, but if anything it slows down each thread individually. See this Q&A for more details, and an explanation of what was wrong with the previous wording of this paragraph.

多CPU的主要问题是在它们上运行的代码最终将访问RAM.有N个CPU,但只有一根总线可以访问RAM.因此,您必须具有一些硬件,以确保a)每个CPU获得相当数量的RAM访问,b)访问RAM的相同部分不会引起问题,并且c)最重要的是,将通知CPU 2当CPU 1写入CPU 2在其内部高速缓存中具有的某个内存地址时.如果没有发生,CPU 2将愉快地使用缓存的值,而忽略了它已过时的事实

The main problem with multi-CPU is that code running on them will eventually access the RAM. There are N CPUs but only one bus to access the RAM. So you must have some hardware which makes sure that a) each CPU gets a fair amount of RAM access, b) that accesses to the same part of the RAM don't cause problems and c) most importantly, that CPU 2 will be notified when CPU 1 writes to some memory address which CPU 2 has in its internal cache. If that doesn't happen, CPU 2 will happily use the cached value, oblivious to the fact that it is outdated

想象一下,您在列表中有任务,并且想要将其分发给所有可用的CPU.因此,CPU 1将从列表中获取第一个元素并更新指针. CPU 2将执行相同的操作.出于效率方面的考虑,两个CPU不仅将少量字节复制到高速缓存中,还将整个高速缓存行"(无论可能是什么)复制.假设是,当您读取字节X时,您很快也会读取X + 1.

Just imagine you have tasks in a list and you want to spread them to all available CPUs. So CPU 1 will fetch the first element from the list and update the pointers. CPU 2 will do the same. For efficiency reasons, both CPUs will not only copy the few bytes into the cache but a whole "cache line" (whatever that may be). The assumption is that, when you read byte X, you'll soon read X+1, too.

现在,两个CPU在其缓存中都有内存的副本.然后,CPU 1将从列表中获取下一项.如果没有缓存同步,它也不会注意到CPU 2也更改了列表,并且它将开始与CPU 2在同一项目上工作.

Now both CPUs have a copy of the memory in their cache. CPU 1 will then fetch the next item from the list. Without cache sync, it won't have noticed that CPU 2 has changed the list, too, and it will start to work on the same item as CPU 2.

这实际上使多CPU变得如此复杂.如果整个代码仅在单个CPU上运行,则这种副作用可能会导致性能比您得到的性能差.解决方案是多核的:您可以轻松地添加所需数量的连线以同步缓存;您甚至可以将数据从一个缓存复制到另一个缓存(更新缓存行的 parts 而不需要刷新并重新加载它),等等.或者缓存逻辑可以确保所有CPU都获得相同的缓存行当他们访问真实RAM的相同部分时,只需将CPU 2阻塞几纳秒,直到CPU 1进行更改.

This is what effectively makes multi-CPU so complicated. Side effects of this can lead to a performance which is worse than what you'd get if the whole code ran only on a single CPU. The solution was multi-core: You can easily add as many wires as you need to synchronize the caches; you could even copy data from one cache to another (updating parts of a cache line without having to flush and reload it), etc. Or the cache logic could make sure that all CPUs get the same cache line when they access the same part of real RAM, simply blocking CPU 2 for a few nanoseconds until CPU 1 has made its changes.

多核比multi-cpu更简单的主要原因是,在主板上,您根本无法在两个芯片之间进行所有布线,而这需要使同步有效.另外,信号仅在30cm/ns的顶部传播(光速;在电线中通常要少得多).并且不要忘记,在多层主板上,信号开始相互影响(串扰).我们喜欢认为0是0V,1是5V,但实际上,"0"介于-0.5V(从1-> 0断开线时为过驱动)和.5V之间,而"1"则高于0.8V.

The main reason why multi-core is simpler than multi-cpu is that on a mainboard, you simply can't run all wires between the two chips which you'd need to make sync effective. Plus a signal only travels 30cm/ns tops (speed of light; in a wire, you usually have much less). And don't forget that, on a multi-layer mainboard, signals start to influence each other (crosstalk). We like to think that 0 is 0V and 1 is 5V but in reality, "0" is something between -0.5V (overdrive when dropping a line from 1->0) and .5V and "1" is anything above 0.8V.

如果您将所有内容都放在一个芯片中,则信号运行速度会更快,并且您可以拥有任意数量的信号(好吧,几乎是:).而且,信号串扰更容易控制.

If you have everything inside of a single chip, signals run much faster and you can have as many as you like (well, almost :). Also, signal crosstalk is much easier to control.

这篇关于多CPU,多核和超线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆