ARM STLR内存排序语义 [英] ARM STLR memory ordering semantics

查看:64
本文介绍了ARM STLR内存排序语义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为ARM STLR的确切语义而苦苦挣扎.

I'm struggling with the exact semantics of the ARM STLR.

根据文档,该文档具有发布语义.因此,如果您有STLR商店,您将得到:

According to the documentation it has release semantics. So if you would have STLR store, you would get:

[StoreStore][LoadStore]
X=r1

其中, X 是内存,而 r1 是某些寄存器.

Whereby X is memory and r1 is some register.

问题在于发布存储和获取负载无法提供顺序一致性:

The problem is that a release store and acquire load, fails to provide sequential consistency:

[StoreStore][LoadStore]
X=r1
r2=Y
[LoadLoad][LoadStore]

在上述情况下,允许对X = r1和r2 = Y进行重新排序.为了使此顺序一致,需要添加[StoreLoad]:

In the above case it is allowed that the X=r1 and r2=Y get reordered. To make this sequential consistent, a [StoreLoad] needs to be added:

[StoreStore][LoadStore]
X=r1
[StoreLoad]
r2=Y
[LoadLoad][LoadStore]

您通常在商店中这样做,因为加载更加频繁.

And you normally do this in the store because loads are more frequent.

在X86上,普通存储为发布存储,普通负载为获取负载.并且[StoreLoad]可以通过MFENCE或使用 LOCK ADDL%(RSP),0 来实现,就像在Hotspot JVM中一样.

On the X86 plain stores are release stores and plain loads are acquire loads. And the [StoreLoad] can be implemented by an MFENCE or using LOCK ADDL %(RSP),0 as is done in Hotspot JVM.

在查阅ARM文档时,似乎LDAR已获得语义.因此应该是[LoadLoad] [LoadStore].

When looking at the ARM documentation, it seems that a LDAR has acquire semantics; so that would be [LoadLoad][LoadStore].

但是STLR的语义是模糊的.当我使用memory_order_seq_cst编译C ++原子时,只有一个STLR.没有DMB.因此,似乎STLR比发行版存储具有更强的内存排序保证.在我看来,在隔离级别上,STLR等效于:

But the semantics of the STLR are vague. When I compile a C++ atomic using memory_order_seq_cst, there is just a STLR; there is no DMB. So it seems that the STLR has much stronger memory ordering guarantees than release store. To me it seems that on a fences level a STLR is equivalent to:

 [StoreStore][LoadStore]
 X=r1
 [StoreLoad]

有人可以对此有所启发吗?

Could someone shed some light on this?

推荐答案

我只是在学习这些东西,所以要加些盐.但是我的理解是,在ARMv8/AArch64中, STLR/LDAR 确实提供了除发行/获取的常规定义之外的其他语义,但没有您的建议那么强.即,发布存储库 STLR 与按程序顺序跟随它的获取负载 LDAR 确实具有顺序一致性,但对于普通的 LDR 加载则没有一致性.

I'm just learning about this stuff, so take with a grain of salt. But my understanding is that in ARMv8/AArch64, STLR/LDAR do provide additional semantics beyond the usual definitions of release/acquire, but not as strong as your suggestion. Namely, a release store STLR does have sequential consistency with an acquire load LDAR that follows it in program order, but not with ordinary LDR loads.

摘自《 ARMv8体系结构参考手册》 B2.3.7,加载获取,加载获取PC和存储释放":

From the ARMv8 Architecture Reference Manual, B2.3.7, "Load-Acquire, Load-AcquirePC, and Store-Release":

在存储释放"之后按程序顺序显示加载获取"的位置时,每个PE都会遵循存储-释放"指令,以达到要求PE一致观察访问的程度,在该PE观察到由Load-Acquire指令生成的内存访问之前,PE必须连贯地观察访问.

Where a Load-Acquire appears in program order after a Store-Release, the memory access generated by the Store-Release instruction is Observed-by each PE to the extent that PE is required to observe the access coherently, before the memory access generated by the Load-Acquire instruction is Observed-by that PE, to the extent that the PE is required to observe the access coherently.

从B2.3.2开始,订购关系":

And from B2.3.2, "Ordering relations":

读或写RW1是屏障顺序的-在从主机进行读或写RW2之前当且仅当RW1以程序顺序出现在RW2之前以及以下任何条件时,才使用同一观察者适用情况:RW1是由具有Release语义的指令生成的写W1,而RW2是读R2由具有Acquire语义的指令生成.

A read or a write RW1 is Barrier-ordered-before a read or a write RW2 from the same Observer if and only if RW1 appears in program order before RW2 and any of the following cases apply: [...] RW1 is a write W1 generated by an instruction with Release semantics and RW2 is a read R2 generated by an instruction with Acquire semantics.

作为测试,我借了 LWimsey的Peterson锁定算法的C ++实现.使用在Godbolt上使用clang 11.0 ,您可以看到,即使要求顺序一致性,编译器仍会生成 STLR,LDAR 来获取锁(汇编的第18-19行),而没有 DMB .我运行了一段时间(Raspberry Pi 4B,Cortex A72、4核),没有违反.

As a test, I borrowed a C++ implementation of Peterson's locking algorithm by LWimsey. With clang 11.0 on godbolt, you can see that even when sequential consistency is requested, the compiler still generates STLR, LDAR to take the lock (lines 18-19 of the assembly), with no DMB. I ran it for a while (Raspberry Pi 4B, Cortex A72, 4 cores) and got no violations.

但是,与您的想法相反, STLR 仍然可以针对跟随其的普通(非获取)负载进行重新排序,因此它没有隐式地具有完整的StoreLoad围栏.我修改了LWimsey的程序,改为使用 STLR,LDR ,并且在添加了一些额外的垃圾以激发比赛后,我能够看到锁冲突.

However, contrary to your idea, STLR can still be reordered with respect to ordinary (non-acquire) loads that follow it, so it does not implicitly have a full StoreLoad fence. I modified LWimsey's program to use STLR, LDR instead, and after adding some extra garbage to provoke the race, I was able to see lock violations.

同样, LDAR 可以相对于其前面的普通(非发行)商店进行重新排序.我同样能够在测试程序中使用 STR,LDAR 违反锁.

Likewise, LDAR can be reordered with respect to ordinary (non-release) stores that precede it. I was similarly able to get lock violations with STR, LDAR in the test program.

这篇关于ARM STLR内存排序语义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆