由于 cpu 乱序执行或缓存一致性问题,是否需要内存屏障? [英] Are memory barriers needed because of cpu out of order execution or because of cache consistency problem?

查看:24
本文介绍了由于 cpu 乱序执行或缓存一致性问题,是否需要内存屏障?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么需要内存屏障,我已经阅读了一些关于这个主题的文章.
有人说这是因为 cpu 乱序执行,而 others 说是因为缓存一致性问题导致存储缓冲区和队列失效.
那么,需要内存屏障的真正原因是什么?cpu乱序执行还是缓存一致性问题?或两者?cpu乱序执行和缓存一致性有关系吗?x86和arm有什么区别?

I'm wonderring why are memory barriers needed and I have read some articles about this toppic.
Someone says it's because of cpu out-of-order execution while others say it is because of cache consistency problems which store buffer and invalidate queue cause.
So, what's the real reason that memory barriers are needed? cpu out-of-order execution or cache consistency problems? or both? Does cpu out-of-order execution have something to do with cache consistency? and what's the difference between x86 and arm?

推荐答案

当 ISA 的内存排序规则弱于您的算法所需的语义时,您需要屏障来排序此核心/线程对全局可见的一致缓存的访问.

You need barriers to order this core / thread's accesses to globally-visible coherent cache when the ISA's memory ordering rules are weaker than the semantics you need for your algorithm.

缓存总是一致,但这与一致性(多个操作之间的排序)是分开的.

Cache is always coherent, but that's a separate thing from consistency (ordering between multiple operations).

您可以在有序 CPU 上进行内存重新排序.更详细地说,How is load->store reordering possible with in-order commit? 展示了如何在开始按程序顺序执行指令的管道,但具有允许未命中命中的缓存和/或允许 OoO 提交的存储缓冲区.

You can have memory reordering on an in-order CPU. In more detail, How is load->store reordering possible with in-order commit? shows how you can get memory reordering on a pipeline that starts executing instructions in program order, but with a cache that allows hit-under-miss and/or a store buffer allowing OoO commit.

相关:

  • Does an x86 CPU reorder instructions? talks about the difference between memory reordering vs. out of order exec. (And how x86's strongly ordered memory model is implemented on top of aggressive out-of-order execution by having hardware track ordering, with the store buffer decoupling store execution from store visibility to other threads/cores.)
  • x86 memory ordering: Loads Reordered with Earlier Stores vs. Intra-Processor Forwarding
  • Globally Invisible load instructions

另请参阅 https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/https://preshing.com/20120930/weak-vs-strong-memory-models 了解更多基础知识.x86 有一个强"的特性.内存排序模型:程序顺序加上带有存储转发的存储缓冲区.C++ acquirerelease 是免费的",只有原子 RMW 和 seq_cst 存储需要屏障.

See also https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ and https://preshing.com/20120930/weak-vs-strong-memory-models for some more basics. x86 has a "strong" memory ordering model: program order plus a store buffer with store-forwarding. C++ acquire and release are "free", only atomic RMWs and seq_cst stores need barriers.

ARM 有一个弱点".内存排序模型:只有 C++ memory_order_consume(数据依赖排序)是免费的",获取和释放需要特殊指令(如 ldar/stlr) 或障碍.

ARM has a "weak" memory ordering model: only C++ memory_order_consume (data dependency ordering) is "free", acquire and release require special instructions (like ldar / stlr) or barriers.

这篇关于由于 cpu 乱序执行或缓存一致性问题,是否需要内存屏障?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆