添加更多CPU时原子操作会变慢吗? [英] Do atomic operations become slower as more CPUs are added?

查看:149
本文介绍了添加更多CPU时原子操作会变慢吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

x86和其他体系结构提供了特殊的原子指令(锁,cmpxchg等),使您可以编写无锁"数据结构.但是随着内核的增加,这些指令在幕后实际要做的工作似乎会增加(至少要保持高速缓存的一致性?).如果今天在双核系统上原子加法需要约100个周期,那么将来的80多个核计算机上是否需要花费更长的时间?如果您要编写持久的代码,即使今天锁比较慢,使用锁实际上是一个更好的主意吗?

x86 and other architectures provide special atomic instructions (lock, cmpxchg, etc.) that allow you to write 'lock free' data structures. But as more and more cores are added, it seems as though the work these instructions will actually have to do behind the scenes will grow (at least to maintain cache coherency?). If an atomic add takes ~100 cycles today on a dual core system, might it take significantly longer on the 80+ core machines of the future? If you're writing code to last, might it actually be a better idea to use locks even if they're slower today?

推荐答案

您是正确的,一旦计数开始超过几十个,拓扑约束将以某种方式增加内核之间通信的延迟.我真的不知道x86公司应对这种扩展的意图是什么.

You are right that topology constraints will, one way or another, increase latency of communication between cores, once the counts start going higher than a couple dozen. I don't really know what the intentions are of the x86 companies for dealing with that sort of scaling.

但是,锁是根据原子操作实现的.因此,除非您以比您自己进行的手动原子操作所尝试的方式更具可扩展性的方式来实现它们,否则您不会真正尝试通过切换到它们来获胜.我认为,通常来说,对于单个令牌式争用,无论您拥有多少个核,原子原语始终将始终是最快的方法.

But locks are implemented in terms of atomic operations. So you don't really win by trying to switch to them, unless they are implemented in a more scalable way than what you would be attempted with your own hand-rolled atomic operations. I think that generally, for single token-like contentions, atomic primitives will always still be the fastest way, regardless of how many cores you have.

正如克雷很久以前发现的那样,这里没有免费的午餐.高级软件设计会尝试尽可能少地使用潜在争用的资源,这将始终在大规模并行化应用程序中带来最大的收益.这意味着要尽可能多地完成锁获取的结果,但是要尽快完成.在极端情况下,这可能意味着在成功获取锁的假设下预先计算您的工作,尝试抓住它,并在成功时尽可能快地完成,否则就放弃您的工作并在失败后重试.

As Cray discovered long time ago, there's no free lunch here. High level software design, where you try to use potentially contentious resources in as infrequent as possible will always lead to the biggest payout in massively parallelized applications. This means doing as much work as possible as the result of a lock acquisition, but as quickly as possible as well. In extreme situations, this can mean pre-calculating your work on the assumption of a successfully acquired lock, trying to grab it, and just completing as fast as possible on success, otherwise throwing away your work and retrying on fail.

这篇关于添加更多CPU时原子操作会变慢吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆