是否由于cpu顺序执行失败或由于缓存一致性问题而需要内存屏障? [英] Are memory barriers needed because of cpu out of order execution or because of cache consistency problem?

查看:77
本文介绍了是否由于cpu顺序执行失败或由于缓存一致性问题而需要内存屏障?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么需要内存障碍,并且我已经阅读了有关该主题的一些文章.
有人说这是由于cpu乱执行,而其他人说这是因为缓存一致性问题导致存储缓冲区并使队列无效.
那么,需要内存障碍的真正原因是什么?cpu乱序执行或缓存一致性问题?或两者?cpu乱序执行是否与缓存一致性有关?x86和arm有什么区别?

I'm wonderring why are memory barriers needed and I have read some articles about this toppic.
Someone says it's because of cpu out-of-order execution while others say it is because of cache consistency problems which store buffer and invalidate queue cause.
So, what's the real reason that memory barriers are needed? cpu out-of-order execution or cache consistency problems? or both? Does cpu out-of-order execution have something to do with cache consistency? and what's the difference between x86 and arm?

推荐答案

当ISA的内存排序规则比您的算法所需的语义弱时,您需要设置障碍以对该核心/线程对全局可见的一致性缓存的访问进行排序.

You need barriers to order this core / thread's accesses to globally-visible coherent cache when the ISA's memory ordering rules are weaker than the semantics you need for your algorithm.

缓存始终是一致的,但这与一致性(多个操作之间的顺序)是分开的.

Cache is always coherent, but that's a separate thing from consistency (ordering between multiple operations).

您可以在有序CPU上对内存进行重新排序.更详细地讲,如何通过有序提交来实现加载存储的重新排序?显示了如何在内存上对存储器进行重新排序.开始以程序顺序执行指令的管道,但具有允许未命中的命中和/或允许OoO提交的存储缓冲区的缓存.

You can have memory reordering on an in-order CPU. In more detail, How is load->store reordering possible with in-order commit? shows how you can get memory reordering on a pipeline that starts executing instructions in program order, but with a cache that allows hit-under-miss and/or a store buffer allowing OoO commit.

相关:

  • Does an x86 CPU reorder instructions? talks about the difference between memory reordering vs. out of order exec. (And how x86's strongly ordered memory model is implemented on top of aggressive out-of-order execution by having hardware track ordering, with the store buffer decoupling store execution from store visibility to other threads/cores.)
  • x86 memory ordering: Loads Reordered with Earlier Stores vs. Intra-Processor Forwarding
  • Globally Invisible load instructions

另请参见 https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ acquire 和 release 是免费的",只有原子RMW和seq_cst存储需要障碍.

See also https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ and https://preshing.com/20120930/weak-vs-strong-memory-models for some more basics. x86 has a "strong" memory ordering model: program order plus a store buffer with store-forwarding. C++ acquire and release are "free", only atomic RMWs and seq_cst stores need barriers.

ARM的状态为弱"内存排序模型:只有C ++ memory_order_consume (数据相关性排序)是免费的",获取和释放都需要特殊的指令(例如 ldar / stlr )或障碍.

ARM has a "weak" memory ordering model: only C++ memory_order_consume (data dependency ordering) is "free", acquire and release require special instructions (like ldar / stlr) or barriers.

这篇关于是否由于cpu顺序执行失败或由于缓存一致性问题而需要内存屏障?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆