ARM STLR 内存排序语义 [英] ARM STLR memory ordering semantics

查看:46
本文介绍了ARM STLR 内存排序语义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为 ARM STLR 的确切语义而苦苦挣扎.

I'm struggling with the exact semantics of the ARM STLR.

根据文档,它具有发布语义.所以如果你有 STLR 存储,你会得到:

According to the documentation it has release semantics. So if you would have STLR store, you would get:

[StoreStore][LoadStore]
X=r1

其中 X 是内存,r1 是一些寄存器.

Whereby X is memory and r1 is some register.

问题在于发布存储和获取加载,无法提供顺序一致性:

The problem is that a release store and acquire load, fails to provide sequential consistency:

[StoreStore][LoadStore]
X=r1
r2=Y
[LoadLoad][LoadStore]

在上述情况下,允许重新排序 X=r1 和 r2=Y.为了使这个顺序一致,需要添加一个[StoreLoad]:

In the above case it is allowed that the X=r1 and r2=Y get reordered. To make this sequential consistent, a [StoreLoad] needs to be added:

[StoreStore][LoadStore]
X=r1
[StoreLoad]
r2=Y
[LoadLoad][LoadStore]

你通常在商店里这样做,因为加载更频繁.

And you normally do this in the store because loads are more frequent.

在 X86 上,普通存储是发布存储,普通加载是获取加载.[StoreLoad] 可以通过 MFENCE 或使用 LOCK ADDL %(RSP),0 实现,就像在 Hotspot JVM 中所做的那样.

On the X86 plain stores are release stores and plain loads are acquire loads. And the [StoreLoad] can be implemented by an MFENCE or using LOCK ADDL %(RSP),0 as is done in Hotspot JVM.

查看 ARM 文档时,似乎 LDAR 已获得语义;所以这将是 [LoadLoad][LoadStore].

When looking at the ARM documentation, it seems that a LDAR has acquire semantics; so that would be [LoadLoad][LoadStore].

但是 STLR 的语义是模糊的.当我使用 memory_order_seq_cst 编译 C++ atomic 时,只有一个 STLR;没有 DMB.因此,STLR 似乎比发布存储具有更强的内存排序保证.在我看来,在栅栏级别上,STLR 等效于:

But the semantics of the STLR are vague. When I compile a C++ atomic using memory_order_seq_cst, there is just a STLR; there is no DMB. So it seems that the STLR has much stronger memory ordering guarantees than release store. To me it seems that on a fences level a STLR is equivalent to:

 [StoreStore][LoadStore]
 X=r1
 [StoreLoad]

有人可以解释一下吗?

推荐答案

我只是在学习这方面的东西,所以请持保留态度.但我的理解是,在 ARMv8/AArch64 中,STLR/LDAR 确实提供了超出通常的 release/acquire 定义的额外语义,但没有你的建议那么强.即,发布存储STLR 确实与按程序顺序跟随它的获取加载LDAR 具有顺序一致性,但与普通的LDR 加载不同.

I'm just learning about this stuff, so take with a grain of salt. But my understanding is that in ARMv8/AArch64, STLR/LDAR do provide additional semantics beyond the usual definitions of release/acquire, but not as strong as your suggestion. Namely, a release store STLR does have sequential consistency with an acquire load LDAR that follows it in program order, but not with ordinary LDR loads.

来自 ARMv8 Architecture Reference Manual, B2.3.7, Load-Acquire, Load-AcquirePC, and Store-Release":

From the ARMv8 Architecture Reference Manual, B2.3.7, "Load-Acquire, Load-AcquirePC, and Store-Release":

在 Store-Release 之后按程序顺序出现 Load-Acquire 时,由Store-Release 指令被每个 PE 观察到需要 PE 连贯地观察访问的程度,在 Load-Acquire 指令生成的内存访问被那个 PE 观察到之前,要求PE一致地观察访问.

Where a Load-Acquire appears in program order after a Store-Release, the memory access generated by the Store-Release instruction is Observed-by each PE to the extent that PE is required to observe the access coherently, before the memory access generated by the Load-Acquire instruction is Observed-by that PE, to the extent that the PE is required to observe the access coherently.

从 B2.3.2 开始,顺序关系":

And from B2.3.2, "Ordering relations":

读取或写入 RW1 在读取或写入 RW2 之前是 Barrier-ordered相同的观察者当且仅当 RW1 出现在 RW2 之前的程序顺序和以下任何一个适用情况:[...] RW1 是由具有释放语义的指令生成的写 W1,而 RW2 是读 R2由具有 Acquire 语义的指令生成.

A read or a write RW1 is Barrier-ordered-before a read or a write RW2 from the same Observer if and only if RW1 appears in program order before RW2 and any of the following cases apply: [...] RW1 is a write W1 generated by an instruction with Release semantics and RW2 is a read R2 generated by an instruction with Acquire semantics.

作为测试,我借用了LWimsey 对 Peterson 锁定算法的 C++ 实现.使用godbolt 上的clang 11.0,你可以看到即使在请求顺序一致性时,编译器仍然生成STLR, LDAR 获取锁(程序集的第 18-19 行),没有 DMB.我运行了一段时间(Raspberry Pi 4B、Cortex A72、4 核)并且没有违规.

As a test, I borrowed a C++ implementation of Peterson's locking algorithm by LWimsey. With clang 11.0 on godbolt, you can see that even when sequential consistency is requested, the compiler still generates STLR, LDAR to take the lock (lines 18-19 of the assembly), with no DMB. I ran it for a while (Raspberry Pi 4B, Cortex A72, 4 cores) and got no violations.

然而,与您的想法相反,STLR 仍然可以相对于它后面的普通(非获取)加载重新排序,因此它没有隐式地具有完整的 StoreLoad 栅栏.我修改了 LWimsey 的程序以使用 STLR, LDR 代替,并且在添加了一些额外的垃圾来引发竞争之后,我能够看到锁违规.

However, contrary to your idea, STLR can still be reordered with respect to ordinary (non-acquire) loads that follow it, so it does not implicitly have a full StoreLoad fence. I modified LWimsey's program to use STLR, LDR instead, and after adding some extra garbage to provoke the race, I was able to see lock violations.

同样,LDAR 可以相对于它之前的普通(非发布)商店重新排序.我同样能够在测试程序中使用 STR, LDAR 获得锁违规.

Likewise, LDAR can be reordered with respect to ordinary (non-release) stores that precede it. I was similarly able to get lock violations with STR, LDAR in the test program.

这篇关于ARM STLR 内存排序语义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆