是什么从形式上保证了非原子变量不会看到空气中的稀疏值,并且创建了像原子弛豫理论上那样的数据竞争呢? [英] What formally guarantees that non-atomic variables can't see out-of-thin-air values and create a data race like atomic relaxed theoretically can?

查看:93
本文介绍了是什么从形式上保证了非原子变量不会看到空气中的稀疏值,并且创建了像原子弛豫理论上那样的数据竞争呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是有关C ++标准的正式保证的问题.

This is a question about the formal guarantees of the C++ standard.

该标准指出,std::memory_order_relaxed原子变量的规则允许凭空出现"/出乎意料"的值出现.

The standard points out that the rules for std::memory_order_relaxed atomic variables allow "out of thin air" / "out of the blue" values to appear.

但是对于非原子变量,此示例可以具有UB吗?在C ++抽象机中是否可能使用r1 == r2 == 42?最初都不是变量== 42,因此您不希望if主体执行,这意味着不会写入共享变量.

But for non-atomic variables, can this example have UB? Is r1 == r2 == 42 possible in the C++ abstract machine? Neither variable == 42 initially so you'd expect neither if body should execute, meaning no writes to the shared variables.

// Global state
int x = 0, y = 0;

// Thread 1:
r1 = x;
if (r1 == 42) y = r1;

// Thread 2:
r2 = y;
if (r2 == 42) x = 42;

上面的示例改编自该标准,该标准明确表示:规范允许对原子对象的行为:

The above example is adapted from the standard, which explicitly says such behavior is allowed by the specification for atomic objects:

[注意:以下要求确实允许r1 == r2 == 42 例如,x和y最初为零:

[Note: The requirements do allow r1 == r2 == 42 in the following example, with x and y initially zero:

// Thread 1:
r1 = x.load(memory_order_relaxed);
if (r1 == 42) y.store(r1, memory_order_relaxed);
// Thread 2:
r2 = y.load(memory_order_relaxed);
if (r2 == 42) x.store(42, memory_order_relaxed);

但是,实现不应允许这种行为. –尾注]

However, implementations should not allow such behavior. – end note]

所谓的内存模型"的哪个部分保护非原子对象免于因看到空洞值的读取而引起的这些交互作用?

What part of the so called "memory model" protects non atomic objects from these interactions caused by reads seeing out-of-thin-air values?

当存在竞争条件 xy 值不同时,保证读取共享变量(正常,非原子)的方法无法实现看到这样的值吗?

When a race condition would exist with different values for x and y, what guarantees that read of a shared variable (normal, non atomic) cannot see such values?

不能执行的if主体能否创建导致数据争夺的自我实现条件?

Can not-executed if bodies create self-fulfilling conditions that lead to a data-race?

推荐答案

您的问题文本似乎缺少示例的重点和空洞的值.您的示例不包含数据争用UB. (可能在这些线程运行之前,如果xy设置为42,在这种情况下,所有选择都关闭,而引用数据争用UB的其他答案也适用.)

The text of your question seems to be missing the point of the example and out-of-thin-air values. Your example does not contain data-race UB. (It might if x or y were set to 42 before those threads ran, in which case all bets are off and the other answers citing data-race UB apply.)

没有针对真实数据竞争的保护措施,仅针对超乎寻常的价值.

我认为您真的是在问如何使mo_relaxed示例与非原子变量的理智和明确定义的行为协调一致.这就是答案所涵盖的范围.

I think you're really asking how to reconcile that mo_relaxed example with sane and well-defined behaviour for non-atomic variables. That's what this answer covers.

(我认为)该间隙不适用于非原子物体,mo_relaxed.

This gap does not (I think) apply to non-atomic objects, only to mo_relaxed.

他们说但是,实现不应允许这种行为. –尾注] .显然,标准委员会无法找到将该要求正式化的方法,因此,目前它只是一个注释,但并不旨在作为可选项目.

They say However, implementations should not allow such behavior. – end note]. Apparently the standards committee couldn't find a way to formalize that requirement so for now it's just a note, but is not intended to be optional.

很明显,即使这不是严格的规范,但C ++标准打算禁止对宽松的原子(通常我认为)采用极小的值.稍后的标准讨论,例如 2018年的p0668r5:修改C ++内存模型(不会解决"此问题,这是不相关的更改)包括多汁的侧面节点,例如:

It's clear that even though this isn't strictly normative, the C++ standard intends to disallow out-of-thin-air values for relaxed atomic (and in general I assume). Later standards discussion, e.g. 2018's p0668r5: Revising the C++ memory model (which doesn't "fix" this, it's an unrelated change) includes juicy side-nodes like:

我们仍然没有一种可以接受的方法来使我们非正式的(自C ++ 14起)禁止超凡脱俗的结果精确.这样做的主要实际效果是,使用松弛原子对C ++程序进行正式验证仍然不可行.以上论文提出了类似于 http:的解决方案: //www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html .我们在这里继续忽略这个问题...

We still do not have an acceptable way to make our informal (since C++14) prohibition of out-of-thin-air results precise. The primary practical effect of that is that formal verification of C++ programs using relaxed atomics remains unfeasible. The above paper suggests a solution similar to http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3710.html . We continue to ignore the problem here ...

因此,是的,该标准的规范性部分对于Relaxed_atomic显然比对非atomic的弱.不幸的是,这似乎是他们如何定义规则的副作用.

So yes, the normative parts of the standard are apparently weaker for relaxed_atomic than they are for non-atomic. This seems to be an unfortunately side effect of how they define the rules.

AFAIK的任何实现都无法在现实生活中产生超乎寻常的价值.

AFAIK no implementations can produce out-of-thin-air values in real life.

标准短语的最新版本(非正式建议)更明确,例如在当前草案中: https://timsong-cpp.github.io/cppwp /atomics.order#8

Later versions of the standard phrase the informal recommendation more clearly, e.g. in the current draft: https://timsong-cpp.github.io/cppwp/atomics.order#8

  1. 实现应确保不计算循环依赖于其自身计算的空中"值.
    ...

[注意:在以下示例中,建议[8.] 不允许r1 == r2 == 42,其中x和y最初也为零:

[ Note: The recommendation [of 8.] similarly disallows r1 == r2 == 42 in the following example, with x and y again initially zero:

  // Thread 1:
  r1 = x.load(memory_order::relaxed);
  if (r1 == 42) y.store(42, memory_order::relaxed);
  // Thread 2:
  r2 = y.load(memory_order::relaxed);
  if (r2 == 42) x.store(42, memory_order::relaxed);

—尾注]


(剩下的答案是在我确定标准打算禁止mo_relaxed之前编写的.)


(This rest of the answer was written before I was sure that the standard intended to disallow this for mo_relaxed, too.)

我很确定C ++抽象机不允许 允许r1 == r2 == 42 .
C ++抽象机操作中所有可能的操作顺序都会导致r1=r2=0,而无需UB,即使没有同步也是如此.因此,该程序没有UB,并且任何非零结果都将违反视情况"规则.

I'm pretty sure the C++ abstract machine does not allow r1 == r2 == 42.
Every possible ordering of operations in the C++ abstract machine operations leads to r1=r2=0 without UB, even without synchronization. Therefore the program has no UB and any non-zero result would violate the "as-if" rule.

从形式上讲,ISO C ++允许实现以任何与C ++抽象机具有相同结果的方式来实现函数/程序.对于多线程代码,实现可以选择一种可能的抽象机排序,并确定总是发生这种排序. (例如,当为一个有序的ISA编译为asm时,对宽松的原子存储进行重新排序时,编写的标准甚至允许合并原子存储,但

Formally, ISO C++ allows an implementation to implement functions / programs in any way that gives the same result as the C++ abstract machine would. For multi-threaded code, an implementation can pick one possible abstract-machine ordering and decide that's the ordering that always happens. (e.g. when reordering relaxed atomic stores when compiling to asm for a strongly-ordered ISA. The standard as written even allows coalescing atomic stores but compilers choose not to). But the result of the program always has to be something the abstract machine could have produced. (Only the Atomics chapter introduces the possibility of one thread observing the actions of another thread without mutexes. Otherwise that's not possible without data-race UB).

我认为其他答案对此不够仔细. (第一次发布时我也没有). 未执行的代码不会导致UB (包括数据争用UB),并且编译器不允许发明对对象的写入. (除了已经无条件编写了代码的代码路径,例如y = (x==42) ? 42 : y;显然会创建数据争用UB的y = (x==42) ? 42 : y;.)

I think the other answers didn't look carefully enough at this. (And neither did I when it was first posted). Code that doesn't execute doesn't cause UB (including data-race UB), and compilers aren't allowed to invent writes to objects. (Except in code paths that already unconditionally write them, like y = (x==42) ? 42 : y; which would obviously create data-race UB.)

对于任何非原子对象,如果没有实际编写它,那么其他线程也可能正在读取它,而不管未执行的if块中的代码如何.该标准允许这样做,并且不允许在抽象机未编写变量时将变量突然读取为不同的值. (对于相邻的数组元素,我们甚至不读取的对象,可能甚至有另一个线程在写它们.)

For any non-atomic object, if don't actually write it then other threads might also be reading it, regardless of code inside not-executed if blocks. The standard allows this and doesn't allow a variable to suddenly read as a different value when the abstract machine hasn't written it. (And for objects we don't even read, like neighbouring array elements, another thread might even be writing them.)

因此,我们无法做任何事情让其他线程暂时看到该对象的不同值,或继续执行该对象的写操作.从本质上讲,对非原子对象进行写操作通常是编译器错误;这是众所周知的,并且得到了普遍的同意,因为它可以破坏不包含UB的代码(并且实际上是针对创建它的一些编译器错误的案例,例如IA-64 GCC,我认为曾经有过这样的错误点破坏了Linux内核). IIRC赫伯·萨特(Herb Sutter)在其演讲的第1部分或第2部分中提到了此类错误,,说它通常在C ++ 11之前就已经被认为是编译器错误,但在C ++ 11之前对此进行了整理,使其更容易确定.

Therefore we can't do anything that would let another thread temporarily see a different value for the object, or step on its write. Inventing writes to non-atomic objects is basically always a compiler bug; this is well known and universally agreed upon because it can break code that doesn't contain UB (and has done so in practice for a few cases of compiler bugs that created it, e.g. IA-64 GCC I think had such a bug at one point that broke the Linux kernel). IIRC, Herb Sutter mentioned such bugs in part 1 or 2 of his talk, atomic<> Weapons: The C++ Memory Model and Modern Hardware", saying that it was already usually considered a compiler bug before C++11, but C++11 codified that and made it easier to be sure.

或另一个最近使用x86的ICC的示例: icc崩溃:

Or another recent example with ICC for x86: Crash with icc: can the compiler invent writes where none existed in the abstract machine?

在C ++抽象机中,没有执行方式可以达到y = r1;x = r2;,无论分支条件的加载顺序或同时性如何. xy都读为0,并且两个线程都没有写过它们.

In the C++ abstract machine, there's no way for execution to reach either y = r1; or x = r2;, regardless of sequencing or simultaneity of the loads for the branch conditions. x and y both read as 0 and neither thread ever writes them.

不需要任何同步来避免UB,因为抽象机操作的顺序不会导致数据争用. ISO C ++标准没有关于推测执行或错误推测到达代码时会发生什么的任何事情.那是因为推测是真实实现的功能,而不是抽象机器的 .实施(硬件供应商和编译器编写者)必须确保遵守假设"规则.

No synchronization is required to avoid UB because no order of abstract-machine operations leads to a data-race. The ISO C++ standard doesn't have anything to say about speculative execution or what happens when mis-speculation reaches code. That's because speculation is a feature of real implementations, not of the abstract machine. It's up to implementations (HW vendors and compiler writers) to ensure the "as-if" rule is respected.

在C ++中,编写类似if (global_id == mine) shared_var = 123; 的代码并让所有线程执行它是合法的,只要最多一个线程实际运行shared_var = 123;语句即可. (并且只要存在同步以避免在非原子int global_id上发生数据争用).如果像 this 这样的事情崩溃了,那就太混乱了.例如,您显然会得出错误的结论,例如对C ++中的原子操作进行重新排序

It's legal in C++ to write code like if (global_id == mine) shared_var = 123; and have all threads execute it, as long as at most one thread actually runs the shared_var = 123; statement. (And as long as synchronization exists to avoid a data race on non-atomic int global_id). If things like this broke down, it would be chaos. For example, you could apparently draw wrong conclusions like reordering atomic operations in C++

发现未发生写操作不是数据争用UB.

Observing that a non-write didn't happen isn't data-race UB.

运行if(i<SIZE) return arr[i];也不是UB,因为只有在i处于边界时,才进行数组访问.

It's also not UB to run if(i<SIZE) return arr[i]; because the array access only happens if i is in bounds.

我认为出乎意料"的价值发明笔记适用于松弛原子,显然是对原子学这一章的特别说明. (即使那样,AFAIK也不实际发生在任何实际的C ++实现上,当然也不是主流实现.在这一点上,实现不必采取任何特殊措施来确保非原子变量不会发生. )

I think the "out of the blue" value-invention note only applies to relaxed-atomics, apparently as a special caveat for them in the Atomics chapter. (And even then, AFAIK it can't actually happen on any real C++ implementations, certainly not mainstream ones. At this point implementations don't have to take any special measures to make sure it can't happen for non-atomic variables.)

在标准的原子"一章之外,我不知道有任何类似的语言允许实现使值像这样突然出现.

I'm not aware of any similar language outside the atomics chapter of the standard that allows an implementation to allow values to appear out of the blue like this.

我看不到任何理智的方法来论证C ++抽象机在执行此操作时会在任何时候导致UB,但是看到r1 == r2 == 42则意味着发生了不同步的读写操作,但这就是数据争用UB.如果可能发生,实现是否可以由于推测执行(或其他某种原因)而发明UB?要使C ++标准完全可用,答案必须为否".

I don't see any sane way to argue that the C++ abstract machine causes UB at any point when executing this, but seeing r1 == r2 == 42 would imply that unsynchronized read+write had happened, but that's data-race UB. If that can happen, can an implementation invent UB because of speculative execution (or some other reason)? The answer has to be "no" for the C++ standard to be usable at all.

对于轻松的原子,无目的地发明42并不意味着UB已经发生.也许这就是为什么标准说它被规则允许的原因?据我所知,该标准的 原子章节中没有任何内容.

For relaxed atomics, inventing the 42 out of nowhere wouldn't imply that UB had happened; perhaps that's why the standard says it's allowed by the rules? As far as I know, nothing outside the Atomics chapter of the standard allows it.

(没有人希望这样做,希望每个人都同意,构建这样的硬件将是一个坏主意.跨逻辑内核耦合推测似乎不太可能

(Nobody wants this, hopefully everyone agrees that it would be a bad idea to build hardware like this. It seems unlikely that coupling speculation across logical cores

要使42成为可能,线程1必须查看线程2的推测存储,并且必须通过线程2的负载来查看线程1的存储. (确认分支推测是好的,从而使该执行路径成为实际采用的真实路径.)

For 42 to be possible, thread 1 has to see thread 2's speculative store and the store from thread 1 has to be seen by thread 2's load. (Confirming that branch speculation as good, allowing this path of execution to become the real path that was actually taken.)

即跨线程推测:如果它们仅在轻量级上下文切换的情况下在同一内核上运行,则可能在当前硬件上运行,例如协程或绿色线程.

i.e. speculation across threads: Possible on current HW if they ran on the same core with only a lightweight context switch, e.g. coroutines or green threads.

但是在当前的硬件上,在这种情况下不可能在线程之间对内存进行重新排序.在同一个内核上无序执行代码给人以程序顺序发生一切事情的错觉.为了使线程之间的内存重新排序,它们需要在不同的内核上运行.

But on current HW, memory reordering between threads is impossible in that case. Out-of-order execution of code on the same core gives the illusion of everything happening in program order. To get memory reordering between threads, they need to be running on different cores.

因此,我们需要一个将两个逻辑核心之间的推测耦合在一起的设计.没人这么做,因为如果检测到错误预测 ,则意味着更多的状态需要回滚.但这是可能的.例如,一个OoO SMT核心,即使在逻辑核心从无序的核心中退出(即成为非推测性的)之前,也允许在其逻辑核心之间进行存储转发.

So we'd need a design that coupled together speculation between two logical cores. Nobody does that because it means more state needs to rollback if a mispredict is detected. But it is hypothetically possible. For example an OoO SMT core that allows store-forwarding between its logical cores even before they've retired from the out-of-order core (i.e. become non-speculative).

PowerPC允许在已退休的存储的逻辑核心之间进行存储转发,这意味着线程可以就存储的全局顺序存在分歧.但是,等到他们毕业"(即退休)并成为非投机者之后,就不会将投机活动结合在单独的逻辑核心上.因此,当一个人从分支未命中中恢复时,其他人可以保持后端繁忙.如果他们都必须回避任何逻辑核心上的错误预测,那将大大削弱SMT的优势.

PowerPC allows store-forwarding between logical cores for retired stores, meaning that threads can disagree about the global order of stores. But waiting until they "graduate" (i.e. retire) and become non-speculative means it doesn't tie together speculation on separate logical cores. So when one is recovering from a branch miss, the others can keep the back-end busy. If they all had to rollback on a mispredict on any logical core, that would defeat a significant part of the benefit of SMT.

我想了一段时间,我发现一个排序导致在真正的弱排序CPU(在线程之间切换用户空间上下文)的单核上,但是最后一步存储无法转发到第一步加载,因为这是程序顺序,OoO exec将其保留.

I thought for a while I'd found an ordering that lead to this on single core of a real weakly-ordered CPUs (with user-space context switching between the threads), but the final step store can't forward to the first step load because this is program order and OoO exec preserves that.

  • T2:r2 = y;停滞(例如缓存未命中)
  • T2:分支预测预测r2 == 42为真. (x = 42应该运行.
  • T2:x = 42运行. (仍然是推测性的; r2 = y hasn't obtained a value yet so the r2 == 42`比较/分支仍在等待确认该推测.)
  • 发生上下文切换到线程1的情况没有将CPU退回到停用状态,或者等待推测被确认为良好或被检测为错误推测.

  • T2: r2 = y; stalls (e.g. cache miss)
  • T2: branch prediction predicts that r2 == 42 will be true. ( x = 42 should run.
  • T2: x = 42 runs. (Still speculative; r2 = yhasn't obtained a value yet so ther2 == 42` compare/branch is still waiting to confirm that speculation).
  • a context switch to Thread 1 happens without rolling back the CPU to retirement state or otherwise waiting for speculation to be confirmed as good or detected as mis-speculation.

这是在真正的C ++实现中不会发生的部分,这些实现仍然不使用Green Threads.真正的CPU不会重命名特权级别:它们不会中断,也不会通过运行中的推测性指令进入内核,而这些推测性指令可能需要回滚并从不同的架构状态重新进入内核模式.

This is the part that doesn't happen on real C++ implementations, which don't still use Green Threads. Real CPUs don't rename the privilege level: they don't take interrupts or otherwise enter the kernel with speculative instructions in flight that might need to rollback and redo entering kernel mode from a different architectural state.

请注意,x = 42r2没有数据依赖性,因此不需要值预测来实现此目的.而且y=r1始终位于if(r1 == 42)内部,因此编译器可以根据需要将其优化为y=42,从而打破了另一个线程中的数据依赖关系,并使事情变得对称.

Note that x = 42 doesn't have a data dependency on r2 so value-prediction isn't required to make this happen. And the y=r1 is inside an if(r1 == 42) anyway so the compiler can optimize to y=42 if it wants, breaking the data dependency in the other thread and making things symmetric.

请注意,有关绿色线程或其他上下文切换的参数实际上并不相关:我们需要使用单独的内核来进行内存重新排序.

Note that the arguments about Green Threads or other context switch on a single core isn't actually relevant: we need separate cores for the memory reordering.

我之前评论说,我认为这可能涉及价值预测. ISO C ++标准的内存模型肯定足够脆弱,以至于无法使用值预测可以创建的那种疯狂的重新排序",但是这种重新排序不是必需的.可以将y=r1优化为y=42,并且原始代码无论如何都包含x=42,因此在r2=y加载时该存储没有数据依赖性. 42的推测性存储很容易就能实现,而无需进行值预测. (问题是让另一个线程看到它们!)

I commented earlier that I thought this might involve value-prediction. The ISO C++ standard's memory model is certainly weak enough to allow the kinds of crazy "reordering" that value-prediction can create to use, but it's not necessary for this reordering. y=r1 can be optimized to y=42, and the original code includes x=42 anyway so there's no data dependency of that store on the r2=y load. Speculative stores of 42 are easily possible without value prediction. (The problem is getting the other thread to see them!)

由于分支预测而不是值预测而进行推测在这里具有相同的作用.在这两种情况下,负载最终都需要查看42,以确认推测是正确的.

Speculating because of branch prediction instead of value prediction has the same effect here. And in both cases the loads need to eventually see 42 to confirm the speculation as correct.

值预测甚至无法帮助使重新排序更加合理.对于这两个推测性存储,我们仍然需要线程间推测内存重新排序,以相互确认并引导自己存在.

Value-prediction doesn't even help make this reordering more plausible. We still need inter-thread speculation and memory reordering for the two speculative stores to confirm each other and bootstrap themselves into existence.

ISO C ++选择允许使用松弛原子,但是AFAICT不允许使用这种非原子变量.我不确定我是否确切地看到标准 中的哪些内容允许ISO C ++中的宽松原子情况,但注释中没有明确禁止这样做.如果还有其他代码对xy做任何事情,那么也许可以,但我认为我的论点确实也适用于宽松的原子情况.在C ++抽象机中无法通过源代码生成任何路径.

ISO C++ chooses to allow this for relaxed atomics, but AFAICT is disallows this non-atomic variables. I'm not sure I see exactly what in the standard does allow the relaxed-atomic case in ISO C++ beyond the note saying it's not explicitly disallowed. If there was any other code that did anything with x or y then maybe, but I think my argument does apply to the relaxed atomic case as well. No path through the source in the C++ abstract machine can produce it.

正如我所说,在任何实际硬件上(在asm中)或在任何实际C ++实现中的C ++中,实践AFAIK都是不可能的.对于非常弱的排序规则(如C ++的宽松原子)的疯狂后果,这更像是一个有趣的思想实验. (那些排序规则不允许这样做,但我认为按原样规则和标准的其余部分都可以,除非有某些规定允许宽松的原子读取值从来没有实际由任何线程编写.)

As I said, it's not possible in practice AFAIK on any real hardware (in asm), or in C++ on any real C++ implementation. It's more of an interesting thought-experiment into crazy consequences of very weak ordering rules, like C++'s relaxed-atomic. (Those ordering rules don't disallow it, but I think the as-if rule and the rest of the standard does, unless there's some provision that allows relaxed atomics to read a value that was never actually written by any thread.)

如果有这样的规则,它将仅适用于松弛原子,而不适用于非原子变量.关于非原子变量和内存顺序,Data-race UB几乎是所有标准需要说的,但我们没有.

If there is such a rule, it would only be for relaxed atomics, not for non-atomic variables. Data-race UB is pretty much all the standard needs to say about non-atomic vars and memory ordering, but we don't have that.

这篇关于是什么从形式上保证了非原子变量不会看到空气中的稀疏值,并且创建了像原子弛豫理论上那样的数据竞争呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆