C# volatile 变量:内存栅栏 VS.缓存 [英] C# volatile variable: Memory fences VS. caching

查看：25 发布时间：2021/12/20 11:16:35 c# caching volatile memory-fences

本文介绍了C# volatile 变量:内存栅栏 VS.缓存的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我研究这个话题已经有一段时间了，我想我理解了最重要的概念，比如释放和获取内存栅栏.

但是，对于volatile和主存缓存的关系，我还没有找到令人满意的解释.

所以，我知道每次对 volatile 字段的读取和写入都强制执行严格的读取顺序以及在它之前和之后的写入操作(读取 - 获取和写入 - 释放)).但这只能保证操作的顺序.它没有说明这些更改对其他线程/处理器可见的时间.特别是，这取决于刷新缓存的时间(如果有的话).我记得读过 Eric Lippert 的评论，他说volatile 字段的存在会自动禁用缓存优化".但我不确定这到底是什么意思.这是否意味着仅仅因为我们在某处有一个 volatile 字段就意味着对整个程序完全禁用缓存?如果不是，缓存被禁用的粒度是多少?

另外，我读了一些关于强和弱易失性语义的内容，并且 C# 遵循强语义，每次写入总是直接进入主内存，无论它是否是 易失性 场与否.我对这一切感到非常困惑.

解决方案

我先解决最后一个问题.Microsoft 的 .NET 实现在写入¹上具有发布语义.它本身不是 C#，因此相同的程序，无论使用何种语言，在不同的实现中都可能具有弱的非易失性写入.

副作用的可见性是关于多线程的.忘记 CPU、内核和缓存.相反，想象一下，每个线程都有堆上内容的快照，需要某种同步来在线程之间传达副作用.

那么，C# 怎么说?<小时>

V1 V3 V2 V4

<小时>

V1 V3 V4 V2

<小时>

V3 V1 V2 V4

<小时>

V3 V1 V4 V2

<小时>

V3 V4 V1 V2

也就是说，观察到的副作用的任何可能顺序对于单次执行的任何线程都是有效的.对总排序没有要求，这样所有线程在一次执行中只观察一个可能的顺序.

事物是如何同步的?

本质上，它归结为:同步点是您在易失性写入之后发生易失性读取的地方.

在实践中，您必须检测一个线程中的 volatile 读是否发生在另一个线程中的 volatile 写之后³.这是一个基本示例:

公共类 InefficientEvent{私有 volatile bool 信号 = false；公共信号(){发出信号 = 真；}公共低效Wait(){而(！发出信号){}}}

但是通常效率低下，您可以运行两个不同的线程，例如一个调用 InefficientWait() 和另一个调用 Signal()，以及后者从 Signal() 返回时对前者可见，当它从 InefficientWait() 返回时变得可见.

易失性访问不像互锁访问那样普遍有用，互锁访问不像同步原语那样普遍有用.我的建议是，您应该首先安全地开发代码，根据需要使用同步原语(锁、信号量、互斥体、事件等)，如果您找到基于实际数据(例如分析)提高性能的理由，那么并且只有这样看看你能不能改进.

如果您对快速锁(仅用于少量读取和写入而不会阻塞)达到高争用，根据争用量，切换到互锁操作可能会提高或降低性能.尤其是当您不得不求助于比较和交换周期时，例如:

var currentValue = Volatile.Read(ref field);var newValue = GetNewValue(currentValue);var oldValue = currentValue;var spinWait = new SpinWait();while ((currentValue = Interlocked.CompareExchange(ref field, newValue, oldValue)) != oldValue){spinWait.SpinOnce();newValue = GetNewValue(currentValue);旧值 = 当前值；}

意思是，您还必须分析解决方案并与当前状态进行比较.并注意A-B-A问题.

还有 SpinLock，您必须真正针对基于监视器的锁对其进行分析，因为尽管它们可能使当前线程屈服，但它们不会使当前线程进入睡眠状态，类似于所示SpinWait 的用法.

切换到 volatile 操作就像玩火.你必须通过分析证明你的代码是正确的，否则你可能会在最意想不到的时候被烧毁.

通常，在高争用的情况下优化的最佳方法是避免争用.例如，要在大列表上并行执行转换，通常最好将问题划分并委托给多个工作项，这些工作项生成在最后一步合并的结果，而不是让多个线程锁定列表以进行更新.这有内存开销，因此取决于数据集的长度.

<小时>

关于易失性操作，C# 规范和 CLI 规范之间有什么区别?

C# 指定了副作用，没有提及它们的线程间可见性，例如读取或写入 volatile 字段、写入非 volatile 变量、写入外部资源以及抛出异常.

C# 指定了在线程之间保留这些副作用的关键执行点:对 volatile 字段的引用、lock 语句以及线程的创建和终止.

如果我们将关键执行点视为副作用变得可见的点，它会在 CLI 规范中添加线程创建和终止是可见副作用，即new Thread(...).Start() 对当前线程有释放语义，在新线程开始时获取语义，退出线程对当前线程有释放语义，thread.Join() 在等待线程上获取语义.

C# 一般没有提及 volatile 操作，例如由 System.Threading 中的类执行，而不是仅通过使用声明为 volatile 的字段并使用 lock 语句.我相信这不是故意的.

C# 声明捕获的变量可以同时暴露给多个线程.CIL 没有提到它，因为闭包是一种语言结构.

<小时>

微软(前)员工和 MVP 在一些地方声明写入具有发布语义:

在我的代码中，我忽略了这个实现细节.我认为不保证非易失性写入可见.

<小时>

有一个常见的误解，认为您可以在 C# 和/或 CLI 中引入读取.

然而，这仅适用于局部参数和变量.

对于静态和实例字段，或数组，或堆上的任何内容，您不能明智地引入读取，因为这样的引入可能会破坏从当前执行线程看到的执行顺序，或者来自其他线程的合法更改，或者通过反思来改变.

也就是说，你不能转动这个:

object local = field;如果(本地！= null){//读取本地的代码}

进入这个:

if (field != null){//将本地读取替换为字段读取的代码}

如果你能分辨出区别的话.具体来说，通过访问 local 的成员抛出 NullReferenceException.

就 C# 的捕获变量而言，它们相当于实例字段.

请务必注意 CLI 标准:

表示不保证非易失性访问是可见的
并不是说非易失性访问保证不可见
表示易失性访问会影响非易失性访问的可见性

但是你可以把这个:

object local2 = local1;如果 (local2 != null){//在假设它不为空的情况下读取 local2 的代码}

进入这个:

if (local1 != null){//将 local2 上的读取替换为 local1 上的读取的代码，//只要 local1 和 local2 具有相同的值}

你可以转这个:

var local = field;本地?.Method()

进入这个:

var local = field;var _temp = 本地；(_temp != null) ?_temp.Method() : 空

或者这个:

var local = field;(本地！= null)?本地方法():空

因为你永远分不出区别.但同样，你不能把它变成这样:

(field != null) ?field.Method() : null

我相信在这两个规范中声明优化编译器可以重新排序读取和写入只要单个执行线程将它们视为写入是谨慎的，而不是通常引入strong> 并消除它们.

请注意，读取消除可能由 C# 编译器或 JIT 编译器执行，即对同一非易失性字段的多次读取，由不'不写入该字段并且不执行易失性操作或等效操作，可能会折叠为单次读取.就好像一个线程从不与其他线程同步，所以它一直观察相同的值:

公共类Worker{私人布尔工作=假；私人布尔停止=假；公共无效开始(){如果(！工作){新线程(工作).开始()；工作=真；}}公共无效工作(){而(！停止){//TODO:没有易失性操作的实际工作}}公共无效停止(){停止 = 真;}}

不能保证 Stop() 会停止工作进程.Microsoft 的 .NET 实现保证 stop = true; 是一个可见的副作用，但它不保证 Work()<内的 stop 上的读取/code> 没有被省略:

 public void Work(){bool localStop = 停止；而 (!localStop){//TODO:没有易失性操作的实际工作}}

那条评论说了很多.要执行此优化，编译器必须证明没有任何易失性操作，无论是直接在块中，还是在整个方法和属性调用树中都间接.

对于这种特定情况，一个正确的实现是将 stop 声明为 volatile.但是还有更多选择，例如使用等效的 Volatile.Read 和 Volatile.Write，使用 Interlocked.CompareExchange，使用 lock 语句围绕对 stop 的访问，使用等同于锁的东西，例如 Mutex，或 Semaphore 和 SemaphoreSlim 如果您不希望锁具有线程相关性，即您可以在与获取它的线程不同的线程上释放它，或者使用 ManualResetEvent 或 ManualResetEventSlim 而不是 stop 在这种情况下，您可以让 Work() 在等待下一次迭代之前的停止信号时超时休眠，等等.><小时>

.NET 的易失性同步与 Java 的易失性同步相比的一个显着区别是 Java 要求您使用相同的易失性位置，而 .NET 只要求在发布(易失性写入)之后进行获取(易失性读取).所以，原则上你可以在.NET中用下面的代码同步，但是你不能用Java中的等价代码同步:

使用系统；使用 System.Threading;公共类 SurrealVolatileSynchronizer{公共易失性布尔 v1 = 假；公共 volatile bool v2 = false;公共整数状态 = 0;公共无效DoWork1(对象b){var 屏障 = (Barrier)b;屏障.SignalAndWait();线程睡眠(100)；状态 = 1;v1 = 真；}公共无效 DoWork2(对象 b){var 屏障 = (Barrier)b;屏障.SignalAndWait();线程睡眠(200)；bool currentV2 = v2;Console.WriteLine("{0}", state);}public static void Main(string[] args){var Synchronizer = new SurrealVolatileSynchronizer();var thread1 = new Thread(synchronizer.DoWork1);var thread2 = new Thread(synchronizer.DoWork2);var 屏障 = 新屏障(3)；thread1.Start(barrier);thread2.Start(barrier);屏障.SignalAndWait();thread1.Join();thread2.Join();}}

这个超现实的例子期望线程和 Thread.Sleep(int) 花费确切的时间.如果是这样，则它会正确同步，因为 DoWork2 在 DoWork1 执行易失性写入(释放)后执行易失性读取(获取).

在 Java 中，即使实现了这种超现实的期望，也不能保证同步.在 DoWork2 中，您必须从您在 DoWork1 中写入的同一个可变字段中读取.

So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences.

However, I haven't found a satisfactory explanation for the relation between volatile and the caching of the main memory.

So, I understand that every read and write to/from a volatile field enforces strict ordering of the read as well as the write operations that precede and follow it (read-acquire and write-release). But that only guarantees the ordering of the operations. It doesn't say anything about the time these changes are visible to other threads/processors. In particular, this depends on the time the cache is flushed (if at all). I remember having read a comment from Eric Lippert saying something along the lines of "the presence of volatile fields automatically disables cache optimizations". But I'm not sure what exactly this means. Does it mean caching is completely disabled for the whole program just because we have a single volatile field somewhere? If not, what is the granularity the cache is disabled for?

Also, I read something about strong and weak volatile semantics and that C# follows the strong semantics where every write will always go straight to main memory no matter if it's a volatile field or not. I am very confused about all of this.

解决方案

I'll address the last question first. Microsoft's .NET implementation has release semantics on writes¹. It's not C# per se, so the same program, no matter the language, in a different implementation can have weak non-volatile writes.

The visibility of side-effects is regarding multiple threads. Forget about CPUs, cores and caches. Imagine, instead, that each thread has a snapshot of what is on the heap that requires some sort of synchronization to communicate side-effects between threads.

So, what does C# say? The C# language specification (newer draft) says fundamentally the same as the Common Language Infrastructure standard (CLI; ECMA-335 and ISO/IEC 23271) with some differences. I'll talk about them later on.

So, what does the CLI say? That only volatile operations are visible side-effects.

Note that it also says that non-volatile operations on the heap are side-effects as well, but not guaranteed to be visible. Just as important², it doesn't state they're guaranteed to not be visible either.

What exactly happens on volatile operations? A volatile read has acquire semantics, it precedes any following memory reference. A volatile write has release semantics, it follows any preceding memory reference.

Acquiring a lock performs a volatile read, and releasing a lock performs a volatile write.

Interlocked operations have acquire and release semantics.

There's another important term to learn, which is atomicity.

Reads and writes, volatile or not, are guaranteed to be atomic on primitive values up to 32 bits on 32-bit architectures and up to 64 bits on 64-bit architectures. They're also guaranteed to be atomic for references. For other types, such as long structs, the operations are not atomic, they may require multiple, independent memory accesses.

However, even with volatile semantics, read-modify-write operations, such as v += 1 or the equivalent ++v (or v++, in terms of side-effects) , are not atomic.

Interlocked operations guarantee atomicity for certain operations, typically addition, subtraction and compare-and-swap (CAS), i.e. write some value if and only if the current value is still some expected value. .NET also has an atomic Read(ref long) method for integers of 64 bits which works even in 32-bit architectures.

I'll keep referring to acquire semantics as volatile reads and release semantics as volatile writes, and either or both as volatile operations.

What does this all mean in terms of order?

That a volatile read is a point before which no memory references may cross, and a volatile write is a point after which no memory references may cross, both at the language level and at the machine level.

That non-volatile operations may cross to after following volatile reads if there are no volatile writes in between, and cross to before preceding volatile writes if there are no volatile reads in between.

That volatile operations within a thread are sequential and may not be reordered.

That volatile operations in a thread are made visible to all other threads in the same order. However, there is no total order of volatile operations from all threads, i.e. if one threads performs V1 and then V2, and another thread performs V3 and then V4, then any order that has V1 before V2 and V3 before V4 can be observed by any thread. In this case, it can be either of the following:

V1 V2 V3 V4

V1 V3 V2 V4

V1 V3 V4 V2

V3 V1 V2 V4

V3 V1 V4 V2

V3 V4 V1 V2

That is, any possible order of observed side-effects are valid for any thread for a single execution. There is no requirement on total ordering, such that all threads observe only one of the possible orders for a single execution.

How are things synchronized?

Essentially, it boils down to this: a synchronization point is where you have a volatile read that happens after a volatile write.

In practice, you must detect if a volatile read in one thread happened after a volatile write in another thread³. Here's a basic example:

public class InefficientEvent
{
    private volatile bool signalled = false;

    public Signal()
    {
        signalled = true;
    }

    public InefficientWait()
    {
        while (!signalled)
        {
        }
    }
}

However generally inefficient, you can run two different threads, such that one calls InefficientWait() and another one calls Signal(), and the side-effects of the latter when it returns from Signal() become visible to the former when it returns from InefficientWait().

Volatile accesses are not as generally useful as interlocked accesses, which are not as generally useful as synchronization primitives. My advice is that you should develop code safely first, using synchronization primitives (locks, semaphores, mutexes, events, etc.) as needed, and if you find reasons to improve performance based on actual data (e.g. profiling), then and only then see if you can improve.

If you ever reach high contention for fast locks (used only for a few reads and writes without blocking), depending on the amount of contention, switching to interlocked operations may either improve or decrease performance. Especially so when you have to resort to compare-and-swap cycles, such as:

var currentValue = Volatile.Read(ref field);
var newValue = GetNewValue(currentValue);
var oldValue = currentValue;
var spinWait = new SpinWait();
while ((currentValue = Interlocked.CompareExchange(ref field, newValue, oldValue)) != oldValue)
{
    spinWait.SpinOnce();
    newValue = GetNewValue(currentValue);
    oldValue = currentValue;
}

Meaning, you have to profile the solution as well and compare with the current state. And be aware of the A-B-A problem.

There's also SpinLock, which you must really profile against monitor-based locks, because although they may make the current thread yield, they don't put the current thread to sleep, akin to the shown usage of SpinWait.

Switching to volatile operations is like playing with fire. You must make sure through analytical proof that your code is correct, otherwise you may get burned when you least expect.

Usually, the best approach for optimization in the case of high contention is to avoid contention. For instance, to perform a transformation on a big list in parallel, it's often better to divide and delegate the problem to multiple work items that generate results which are merged in a final step, rather than having multiple threads locking the list for updates. This has a memory cost, so it depends on the length of the data set.

What are the differences between the C# specification and the CLI specification regarding volatile operations?

C# specifies side-effects, not mentioning their inter-thread visibility, as being a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception.

C# specifies critical execution points at which these side-effects are preserved between threads: references to volatile fields, lock statements, and thread creation and termination.

If we take critical execution points as points where side-effects become visible, it adds to the CLI specification that thread creation and termination are visible side-effects, i.e. new Thread(...).Start() has release semantics on the current thread and acquire semantics at the start of the new thread, and exiting a thread has release semantics on the current thread and thread.Join() has acquire semantics on the waiting thread.

C# doesn't mention volatile operations in general, such as performed by classes in System.Threading instead of only through using fields declared as volatile and using the lock statement. I believe this is not intentional.

C# states that captured variables can be simultaneously exposed to multiple threads. The CIL doesn't mention it, because closures are a language construct.

There are a few places where Microsoft (ex-)employees and MVPs state that writes have release semantics:

In my code, I ignore this implementation detail. I assume non-volatile writes are not guaranteed to become visible.

There is a common misconception that you're allowed to introduce reads in C# and/or the CLI.

However, that is true only for local arguments and variables.

For static and instance fields, or arrays, or anything on the heap, you cannot sanely introduce reads, as such introduction may break the order of execution as seen from the current thread of execution, either from legitimate changes in other threads, or from changes through reflection.

That is, you can't turn this:

object local = field;
if (local != null)
{
    // code that reads local
}

into this:

if (field != null)
{
    // code that replaces reads on local with reads on field
}

if you can ever tell the difference. Specifically, a NullReferenceException being thrown by accessing local's members.

In the case of C#'s captured variables, they're equivalent to instance fields.

It's important to note that the CLI standard:

says that non-volatile accesses are not guaranteed to be visible
doesn't say that non-volatile accesses are guaranteed to not be visible
says that volatile accesses affect the visibility of non-volatile accesses

But you can turn this:

object local2 = local1;
if (local2 != null)
{
    // code that reads local2 on the assumption it's not null
}

into this:

if (local1 != null)
{
    // code that replaces reads on local2 with reads on local1,
    // as long as local1 and local2 have the same value
}

You can turn this:

var local = field;
local?.Method()

into this:

var local = field;
var _temp = local;
(_temp != null) ? _temp.Method() : null

or this:

var local = field;
(local != null) ? local.Method() : null

because you can't ever tell the difference. But again, you cannot turn it into this:

(field != null) ? field.Method() : null

I believe it was prudent in both specifications stating that an optimizing compiler may reorder reads and writes as long as a single thread of execution observes them as written, instead of generally introducing and eliminating them altogether.

Note that read elimination may be performed by either the C# compiler or the JIT compiler, i.e. multiple reads on the same non-volatile field, separated by instructions that don't write to that field and that don't perform volatile operations or equivalent, may be collapsed to a single read. It's as if a thread never synchronizes with other threads, so it keeps observing the same value:

public class Worker
{
    private bool working = false;
    private bool stop = false;

    public void Start()
    {
        if (!working)
        {
            new Thread(Work).Start();
            working = true;
        }
    }

    public void Work()
    {
        while (!stop)
        {
            // TODO: actual work without volatile operations
        }
    }

    public void Stop()
    {
        stop = true;
    }
}

There's no guarantee that Stop() will stop the worker. Microsoft's .NET implementation guarantees that stop = true; is a visible side-effect, but it doesn't guarantee that the read on stop inside Work() is not elided to this:

    public void Work()
    {
        bool localStop = stop;
        while (!localStop)
        {
            // TODO: actual work without volatile operations
        }
    }

That comment says quite a lot. To perform this optimization, the compiler must prove that there are no volatile operations whatsoever, either directly in the block, or indirectly in the whole methods and properties call tree.

For this specific case, one correct implementation is to declare stop as volatile. But there are more options, such as using the equivalent Volatile.Read and Volatile.Write, using Interlocked.CompareExchange, using a lock statement around accesses to stop, using something equivalent to a lock, such as a Mutex, or Semaphore and SemaphoreSlim if you don't want the lock to have thread-affinity, i.e. you can release it on a different thread than the one that acquired it, or using a ManualResetEvent or ManualResetEventSlim instead of stop in which case you can make Work() sleep with a timeout while waiting for a stop signal before the next iteration, etc.

One significant difference of .NET's volatile synchronization compared to Java's volatile synchronization is that Java requires you to use the same volatile location, whereas .NET only requires that an acquire (volatile read) happens after a release (volatile write). So, in principle you can synchronize in .NET with the following code, but you can't synchronize with the equivalent code in Java:

using System;
using System.Threading;

public class SurrealVolatileSynchronizer
{
    public volatile bool v1 = false;
    public volatile bool v2 = false;
    public int state = 0;

    public void DoWork1(object b)
    {
        var barrier = (Barrier)b;
        barrier.SignalAndWait();
        Thread.Sleep(100);
        state = 1;
        v1 = true;
    }

    public void DoWork2(object b)
    {
        var barrier = (Barrier)b;
        barrier.SignalAndWait();
        Thread.Sleep(200);
        bool currentV2 = v2;
        Console.WriteLine("{0}", state);
    }

    public static void Main(string[] args)
    {
        var synchronizer = new SurrealVolatileSynchronizer();
        var thread1 = new Thread(synchronizer.DoWork1);
        var thread2 = new Thread(synchronizer.DoWork2);
        var barrier = new Barrier(3);
        thread1.Start(barrier);
        thread2.Start(barrier);
        barrier.SignalAndWait();
        thread1.Join();
        thread2.Join();
    }
}

This surreal example expects threads and Thread.Sleep(int) to take an exact amount of time. If this is so, it synchronizes correctly, because DoWork2 performs a volatile read (acquire) after DoWork1 performs a volatile write (release).

In Java, even with such surreal expectations fulfilled, this would not guarantee synchronization. In DoWork2, you'd have to read from the same volatile field you wrote to in DoWork1.

这篇关于C# volatile 变量:内存栅栏 VS.缓存的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

C# volatile 变量:内存栅栏 VS.缓存 [英] C# volatile variable: Memory fences VS. caching

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

C# volatile 变量:内存栅栏 VS.缓存 [英] C# volatile variable: Memory fences VS. caching

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭