多核处理器的关键部分 [英] Critical sections with multicore processors

查看:65
本文介绍了多核处理器的关键部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在单核处理器中,所有线程都在一个CPU上运行,在内存中的某些互斥量(或信号量等)上使用原子测试并设置操作来实现关键部分的想法似乎足够简单;因为您的处理器正在从程序中的一个位置执行测试设置,所以它可能无法伪装成其他线程来从程序中的另一个位置进行测试.

With a single-core processor, where all your threads are run from the one single CPU, the idea of implementing a critical section using an atomic test-and-set operation on some mutex (or semaphore or etc) in memory seems straightforward enough; because your processor is executing a test-and-set from one spot in your program, it necessarily can't be doing one from another spot in your program disguised as some other thread.

但是,当您实际上拥有多个物理处理器时,会发生什么?似乎简单的指令级原子性是不够的,因为两个处理器可能同时执行其测试和设置操作的b/c,您真正需要保持原子性的是访问存储器的共享内存位置.互斥体. (而且,如果将共享内存位置加载到缓存中,那么也要处理整个缓存一致性问题.)

But what happens when you do actually have more than one physical processor? It seems that simple instruction level atomicity wouldn't be sufficient, b/c with two processors potentially executing their test-and-set operations at the same time, what you really need to maintain atomicity on is access to the shared memory location of the mutex. (And if the shared memory location is loaded into cache, there's the whole cache consistency thing to deal with, too..)

这似乎会比单核情况产生更多的开销,所以这是问题的实质:它有多糟?会更糟吗?我们只是忍受它吗?还是通过强制执行一个策略,即进程组中的所有线程必须生活在同一物理核心上来回避它?

This seems like it would incur far more overhead than the single core case, so here's the meat of the question: How much worse is it? Is it worse? Do we just live with it? Or sidestep it by enforcing a policy that all threads within a process group have to live on the same physical core?

推荐答案

多核/SMP系统不仅仅是将多个CPU粘合在一起.对于并行处理有明确的支持.所有同步原语都是在硬件的帮助下按照原子CAS 的方式实现的.该指令要么锁定CPU和内存控制器(以及执行DMA的设备)共享的总线并更新内存,要么仅依赖缓存监听.反过来,这会导致缓存一致性算法开始强制所有相关方刷新其缓存.

免责声明-这是非常基本的描述,这里有更多有趣的事情,例如虚拟与物理缓存,缓存回写策略,内存模型,围栏等.

如果您想了解有关OS如何使用这些硬件设施的更多信息-这是关于该主题的一本好书.

Multi-core/SMP systems are not just several CPUs glued together. There's explicit support for doing things in parallel. All the synchronization primitives are implemented with the help of hardware along the lines of atomic CAS. The instruction either locks the bus shared by CPUs and memory controller (and devices that do DMA) and updates the memory, or just updates the memory relying on cache snooping. This in turn causes cache coherency algorithm to kick in forcing all involved parties to flush their caches.

Disclaimer - this is very basic description, there are more interesting things here like virtual vs. physical caches, cache write-back policies, memory models, fences, etc. etc.

If you want to know more about how OS might use these hardware facilities - here's an excellent book on the subject.

这篇关于多核处理器的关键部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆