C#易失性变量:内存隔离VS。快取 [英] C# volatile variable: Memory fences VS. caching

查看:112
本文介绍了C#易失性变量:内存隔离VS。快取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我现在已经研究了很长时间了,我认为我理解了最重要的概念,例如发布和获取内存保护区



但是,对于 volatile 与主内存缓存之间的关系,我还没有找到令人满意的解释。



因此,我了解到每个对 volatile 字段的读写都对读取以及在读取之前和之后的写入操作进行严格的排序(读取获取和写入释放)。但这只能保证操作的顺序。它并没有说明其他线程/处理器这些更改可见的时间。特别是,这取决于刷新缓存的时间(如果有的话)。我记得曾经读过Eric Lippert的评论,内容是存在 volatile 字段会自动禁用缓存优化。但是我不确定这到底意味着什么。这是否意味着仅由于我们在某个地方只有一个 volatile 字段而对整个程序完全禁用了缓存?如果不是,禁用缓存的粒度是什么?



此外,我读了一些有关强而弱的易失语义的知识,并且遵循C#强大的语义,无论是否为 volatile 字段,每次写入都会一直直接进入主内存。我对此感到非常困惑。

解决方案

我将首先解决最后一个问题。微软的.NET实现已在writes 1 上发布了语义。它本身不是C#,所以相同的程序,无论使用哪种语言,在不同的实现中都可能具有较弱的非易失性写入。



可见性

那么,C#怎么办?说?






  • V1 V3 V2 V4







  • V1 V3 V4 V2







  • V3 V1 V2 V4







  • V3 V1 V4 V2







  • V3 V4 V1 V2



观察到的副作用的可能顺序对于一次执行的任何线程均有效。不需要总排序,因此所有线程只观察一次执行的可能订单。



事物如何同步?



从本质上讲,它可以归结为:同步点是您进行易失性写入后发生易失性读取的地方。



在实际上,您必须检测在另一个线程中进行易失性写入后是否在一个线程中进行了易失性读取 3 。这是一个基本示例:

 公共类InficientEvent 
{
private volatile bool signaled = false;

public Signal()
{
signaled = true;
}

public InefficientWait()
{
while(!signalled)
{
}
}
}

但是通常效率低下,您可以运行两个不同的线程,这样一个调用 InficientWait(),另外一个调用 Signal(),后者从返回时的副作用当前者从 InefficientWait()返回时,Signal()对前者可见。



易失性访问通常不如互锁访问有用,互锁访问通常不如同步原语有用。我的建议是,您应该首先安全地开发代码,并根据需要使用同步原语(锁,信号量,互斥体,事件等),如果发现基于实际数据(例如,分析)提高性能的原因,则应仅此而后



如果您达到了较高的争用以进行快速锁定(仅用于几次读写操作而没有阻塞),根据争用的数量,切换到互锁的操作可能会提高或降低性能。特别是当您必须诉诸于比较和交换周期时,例如:

  var currentValue = Volatile.Read(ref领域); 
var newValue = GetNewValue(currentValue);
var oldValue = currentValue;
var spinWait = new SpinWait();
while(((currentValue = Interlocked.CompareExchange(ref field,newValue,oldValue))!= oldValue)
{
spinWait.SpinOnce();
newValue = GetNewValue(currentValue);
oldValue = currentValue;
}

意思是,您还必须分析解决方案并与当前状态进行比较。并且要注意 ABA问题



还有 SpinLock ,您必须对基于监视器的锁进行真正的分析,因为尽管它们可以使当前线程产生收益,但它们不会使当前线程进入休眠状态,类似于所示的 SpinWait 的用法。



切换到易失性操作就像在玩火。您必须通过分析证明来确保您的代码是正确的,否则您可能会在最不期望的时候感到疲倦。



通常,对于高水平的情况,最佳的优化方法争夺是为了避免争执。例如,要对一个大列表并行执行转换,通常最好将问题划分并委派给多个工作项目,这些工作项目会生成结果并在最后一步进行合并,而不是让多个线程锁定该列表以进行更新。这会占用内存,因此取决于数据集的长度。






C#之间有什么区别?



C#指定副作用,而不是提及它们的线程间可见性,因为它们是对易失性字段的读取或写入。



C#指定保留这些副作用的关键执行点。



C#线程之间:引用易失性字段, lock 语句以及线程创建和终止。



如果我们执行关键执行点作为副作用变得可见的点,它在CLI规范中增加了线程创建和终止是可见副作用,即 new Thread (...)。Start()已在当前线程上释放语义并获取信号量ntics在新线程的开始处退出,退出线程在当前线程中释放了语义,而 thread.Join()在等待线程中获得了语义。



C#通常不提及易失性操作,例如由 System.Threading 中的类执行,而不是仅通过使用字段来执行声明为 volatile 并使用 lock 语句。我相信这不是故意的。



C#指出捕获的变量可以同时暴露给多个线程。 CIL没有提及它,因为闭包是一种语言构造。






1。



在某些地方,微软(前)员工和MVP声明写有释放的语义:





在我的代码中,我忽略了此实现细节。我假设不保证非易失性写入不可见。






2。



常见的误解是允许您在C#和/或CLI中引入读取。





但是,这仅适用于局部参数和变量。



对于静态和实例字段,数组或堆中的任何内容,您可以明智地引入读取,因为从当前执行线程来看,这种引入可能会破坏执行的顺序,可能是其他线程的合法更改,也可能是通过反射的更改。



也就是说,您不能打开它:

  object local = field; 
if(local!= null)
{
//读取本地
的代码}

到此:

  if(field!= null)
{
//将字段中的读取替换为字段
的代码}

if你可以分辨出差异。具体来说,是通过访问 local 的成员而引发的 NullReferenceException



对于C#捕获的变量,它们等同于实例字段。



需要注意的是CLI标准:




  • 说不能保证非易失性访问可见


  • doesn并不是说非易失性访问肯定是不可见的


  • 说不稳定访问会影响非易失性访问的可见性




但是您可以打开:

  object local2 = local1; 
if(local2!= null)
{
//假设不是null的情况下读取local2的代码
}

为此:

  if(local1!= null) 
{
//代码将local2的读取替换为local1的读取,
//只要local1和local2具有相同的值
}

您可以将其设置为:

  var local =字段; 
local?.Method()

为此:

  var local =字段; 
var _temp =本地;
(_temp!= null)? _temp.Method():null

或此:

  var local =字段; 
(local!= null)? local.Method():null

因为您无法分辨出两者之间的区别。但是同样,您不能将其转换为:

 (field!= null)吗? field.Method():null 

我认为在两个规范中都应谨慎地指出,优化编译器可以只要单个执行线程观察到它们是书面状态,即可进行 reorder 读写,而不是通常完全引入消除它们。



请注意,消除 可能是由C#编译器或JIT编译器执行的,即对同一非读取对象进行多次读取-volatile字段由不写入该字段且不执行volatile操作或等效操作的指令分隔,可以折叠为一次读取。好像一个线程从不与其他线程同步,因此它一直保持相同的值:

  public class Worker 
{
私人布尔工作=假;
私人布尔停止=假;

public void Start()
{
if(!working)
{
new Thread(Work).Start();
working = true;
}
}

public void Work()
{
while(!stop)
{
// //待办事项:没有波动性操作的实际工作
}
}

public void Stop()
{
stop = true;
}
}

不能保证停止()将停止工作程序。微软的.NET实现保证 stop = true; 是明显的副作用,但不保证 stop Work()内的c $ c>不会被忽略:

  public void Work()
{
bool localStop =停止;
而(!localStop)
{
// TODO:没有波动操作的实际工作
}
}

该评论说了很多话。要执行此优化,编译器必须证明没有易失性操作,无论是直接在块中还是间接在整个方法和属性调用树中。



在这种情况下,一个正确的实现是将 stop 声明为 volatile 。但是还有更多选择,例如使用等效的 Volatile.Read Volatile.Write ,使用 Interlocked.CompareExchange ,使用 lock 语句,访问止损等效于锁,例如 Mutex Semaphore SemaphoreSlim 如果您不希望锁具有线程亲和性,即可以将锁释放到与获取锁的线程不同的线程上,或者使用 ManualResetEvent ManualResetEventSlim 而不是 stop ,在这种情况下,您可以使 Work()超时睡眠,等待下一次迭代之前的停止信号,等等。






3。

p>

.NET的易失性同步与Java的易失性同步的一个显着差异是Java要求您使用相同的易失性位置,而.NET仅要求获取(vo延迟读取)发生在释放(可变写入)之后。因此,原则上,您可以在.NET中与以下代码进行同步,但不能与Java中的等效代码进行同步:

 使用系统; 
使用System.Threading;

公共类SurrealVolatileSynchronizer
{
public volatile bool v1 = false;
public volatile bool v2 =假;
public int state = 0;

public void DoWork1(对象b)
{
var barrier =(Barrier)b;
barrier.SignalAndWait();
Thread.Sleep(100);
状态= 1;
v1 = true;
}

public void DoWork2(对象b)
{
var barrier =(Barrier)b;
barrier.SignalAndWait();
Thread.Sleep(200);
bool currentV2 = v2;
Console.WriteLine( {0},状态);
}

public static void Main(string [] args)
{
var syncnizer = new SurrealVolatileSynchronizer();
var thread1 = new Thread(synchronizer.DoWork1);
var thread2 = new Thread(synchronizer.DoWork2);
var barrier = new Barrier(3);
thread1.Start(屏障);
thread2.Start(屏障);
barrier.SignalAndWait();
thread1.Join();
thread2.Join();
}
}

这个超现实的示例期望线程和 Thread.Sleep(int)花费确切的时间。如果是这样,它将正确同步,因为 DoWork2 DoWork1 执行易失性写入之后执行了易失性读取(获取) (发布)。



在Java中,即使满足了这种超现实的期望,也不能保证同步。在 DoWork2 中,您必须从在 DoWork1 中写入的同一易失字段中读取。


So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences.

However, I haven't found a satisfactory explanation for the relation between volatile and the caching of the main memory.

So, I understand that every read and write to/from a volatile field enforces strict ordering of the read as well as the write operations that precede and follow it (read-acquire and write-release). But that only guarantees the ordering of the operations. It doesn't say anything about the time these changes are visible to other threads/processors. In particular, this depends on the time the cache is flushed (if at all). I remember having read a comment from Eric Lippert saying something along the lines of "the presence of volatile fields automatically disables cache optimizations". But I'm not sure what exactly this means. Does it mean caching is completely disabled for the whole program just because we have a single volatile field somewhere? If not, what is the granularity the cache is disabled for?

Also, I read something about strong and weak volatile semantics and that C# follows the strong semantics where every write will always go straight to main memory no matter if it's a volatile field or not. I am very confused about all of this.

解决方案

I'll address the last question first. Microsoft's .NET implementation has release semantics on writes1. It's not C# per se, so the same program, no matter the language, in a different implementation can have weak non-volatile writes.

The visibility of side-effects is regarding multiple threads. Forget about CPUs, cores and caches. Imagine, instead, that each thread has a snapshot of what is on the heap that requires some sort of synchronization to communicate side-effects between threads.

So, what does C# say? The C# language specification (newer draft) says fundamentally the same as the Common Language Infrastructure standard (CLI; ECMA-335 and ISO/IEC 23271) with some differences. I'll talk about them later on.

So, what does the CLI say? That only volatile operations are visible side-effects.

Note that it also says that non-volatile operations on the heap are side-effects as well, but not guaranteed to be visible. Just as important2, it doesn't state they're guaranteed to not be visible either.

What exactly happens on volatile operations? A volatile read has acquire semantics, it precedes any following memory reference. A volatile write has release semantics, it follows any preceding memory reference.

Acquiring a lock performs a volatile read, and releasing a lock performs a volatile write.

Interlocked operations have acquire and release semantics.

There's another important term to learn, which is atomicity.

Reads and writes, volatile or not, are guaranteed to be atomic on primitive values up to 32 bits on 32-bit architectures and up to 64 bits on 64-bit architectures. They're also guaranteed to be atomic for references. For other types, such as long structs, the operations are not atomic, they may require multiple, independent memory accesses.

However, even with volatile semantics, read-modify-write operations, such as v += 1 or the equivalent ++v (or v++, in terms of side-effects) , are not atomic.

Interlocked operations guarantee atomicity for certain operations, typically addition, subtraction and compare-and-swap (CAS), i.e. write some value if and only if the current value is still some expected value. .NET also has an atomic Read(ref long) method for integers of 64 bits which works even in 32-bit architectures.

I'll keep referring to acquire semantics as volatile reads and release semantics as volatile writes, and either or both as volatile operations.

What does this all mean in terms of order?

That a volatile read is a point before which no memory references may cross, and a volatile write is a point after which no memory references may cross, both at the language level and at the machine level.

That non-volatile operations may cross to after following volatile reads if there are no volatile writes in between, and cross to before preceding volatile writes if there are no volatile reads in between.

That volatile operations within a thread are sequential and may not be reordered.

That volatile operations in a thread are made visible to all other threads in the same order. However, there is no total order of volatile operations from all threads, i.e. if one threads performs V1 and then V2, and another thread performs V3 and then V4, then any order that has V1 before V2 and V3 before V4 can be observed by any thread. In this case, it can be either of the following:

  • V1 V2 V3 V4

  • V1 V3 V2 V4

  • V1 V3 V4 V2

  • V3 V1 V2 V4

  • V3 V1 V4 V2

  • V3 V4 V1 V2

That is, any possible order of observed side-effects are valid for any thread for a single execution. There is no requirement on total ordering, such that all threads observe only one of the possible orders for a single execution.

How are things synchronized?

Essentially, it boils down to this: a synchronization point is where you have a volatile read that happens after a volatile write.

In practice, you must detect if a volatile read in one thread happened after a volatile write in another thread3. Here's a basic example:

public class InefficientEvent
{
    private volatile bool signalled = false;

    public Signal()
    {
        signalled = true;
    }

    public InefficientWait()
    {
        while (!signalled)
        {
        }
    }
}

However generally inefficient, you can run two different threads, such that one calls InefficientWait() and another one calls Signal(), and the side-effects of the latter when it returns from Signal() become visible to the former when it returns from InefficientWait().

Volatile accesses are not as generally useful as interlocked accesses, which are not as generally useful as synchronization primitives. My advice is that you should develop code safely first, using synchronization primitives (locks, semaphores, mutexes, events, etc.) as needed, and if you find reasons to improve performance based on actual data (e.g. profiling), then and only then see if you can improve.

If you ever reach high contention for fast locks (used only for a few reads and writes without blocking), depending on the amount of contention, switching to interlocked operations may either improve or decrease performance. Especially so when you have to resort to compare-and-swap cycles, such as:

var currentValue = Volatile.Read(ref field);
var newValue = GetNewValue(currentValue);
var oldValue = currentValue;
var spinWait = new SpinWait();
while ((currentValue = Interlocked.CompareExchange(ref field, newValue, oldValue)) != oldValue)
{
    spinWait.SpinOnce();
    newValue = GetNewValue(currentValue);
    oldValue = currentValue;
}

Meaning, you have to profile the solution as well and compare with the current state. And be aware of the A-B-A problem.

There's also SpinLock, which you must really profile against monitor-based locks, because although they may make the current thread yield, they don't put the current thread to sleep, akin to the shown usage of SpinWait.

Switching to volatile operations is like playing with fire. You must make sure through analytical proof that your code is correct, otherwise you may get burned when you least expect.

Usually, the best approach for optimization in the case of high contention is to avoid contention. For instance, to perform a transformation on a big list in parallel, it's often better to divide and delegate the problem to multiple work items that generate results which are merged in a final step, rather than having multiple threads locking the list for updates. This has a memory cost, so it depends on the length of the data set.


What are the differences between the C# specification and the CLI specification regarding volatile operations?

C# specifies side-effects, not mentioning their inter-thread visibility, as being a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception.

C# specifies critical execution points at which these side-effects are preserved between threads: references to volatile fields, lock statements, and thread creation and termination.

If we take critical execution points as points where side-effects become visible, it adds to the CLI specification that thread creation and termination are visible side-effects, i.e. new Thread(...).Start() has release semantics on the current thread and acquire semantics at the start of the new thread, and exiting a thread has release semantics on the current thread and thread.Join() has acquire semantics on the waiting thread.

C# doesn't mention volatile operations in general, such as performed by classes in System.Threading instead of only through using fields declared as volatile and using the lock statement. I believe this is not intentional.

C# states that captured variables can be simultaneously exposed to multiple threads. The CIL doesn't mention it, because closures are a language construct.


1.

There are a few places where Microsoft (ex-)employees and MVPs state that writes have release semantics:

In my code, I ignore this implementation detail. I assume non-volatile writes are not guaranteed to become visible.


2.

There is a common misconception that you're allowed to introduce reads in C# and/or the CLI.

However, that is true only for local arguments and variables.

For static and instance fields, or arrays, or anything on the heap, you cannot sanely introduce reads, as such introduction may break the order of execution as seen from the current thread of execution, either from legitimate changes in other threads, or from changes through reflection.

That is, you can't turn this:

object local = field;
if (local != null)
{
    // code that reads local
}

into this:

if (field != null)
{
    // code that replaces reads on local with reads on field
}

if you can ever tell the difference. Specifically, a NullReferenceException being thrown by accessing local's members.

In the case of C#'s captured variables, they're equivalent to instance fields.

It's important to note that the CLI standard:

  • says that non-volatile accesses are not guaranteed to be visible

  • doesn't say that non-volatile accesses are guaranteed to not be visible

  • says that volatile accesses affect the visibility of non-volatile accesses

But you can turn this:

object local2 = local1;
if (local2 != null)
{
    // code that reads local2 on the assumption it's not null
}

into this:

if (local1 != null)
{
    // code that replaces reads on local2 with reads on local1,
    // as long as local1 and local2 have the same value
}

You can turn this:

var local = field;
local?.Method()

into this:

var local = field;
var _temp = local;
(_temp != null) ? _temp.Method() : null

or this:

var local = field;
(local != null) ? local.Method() : null

because you can't ever tell the difference. But again, you cannot turn it into this:

(field != null) ? field.Method() : null

I believe it was prudent in both specifications stating that an optimizing compiler may reorder reads and writes as long as a single thread of execution observes them as written, instead of generally introducing and eliminating them altogether.

Note that read elimination may be performed by either the C# compiler or the JIT compiler, i.e. multiple reads on the same non-volatile field, separated by instructions that don't write to that field and that don't perform volatile operations or equivalent, may be collapsed to a single read. It's as if a thread never synchronizes with other threads, so it keeps observing the same value:

public class Worker
{
    private bool working = false;
    private bool stop = false;

    public void Start()
    {
        if (!working)
        {
            new Thread(Work).Start();
            working = true;
        }
    }

    public void Work()
    {
        while (!stop)
        {
            // TODO: actual work without volatile operations
        }
    }

    public void Stop()
    {
        stop = true;
    }
}

There's no guarantee that Stop() will stop the worker. Microsoft's .NET implementation guarantees that stop = true; is a visible side-effect, but it doesn't guarantee that the read on stop inside Work() is not elided to this:

    public void Work()
    {
        bool localStop = stop;
        while (!localStop)
        {
            // TODO: actual work without volatile operations
        }
    }

That comment says quite a lot. To perform this optimization, the compiler must prove that there are no volatile operations whatsoever, either directly in the block, or indirectly in the whole methods and properties call tree.

For this specific case, one correct implementation is to declare stop as volatile. But there are more options, such as using the equivalent Volatile.Read and Volatile.Write, using Interlocked.CompareExchange, using a lock statement around accesses to stop, using something equivalent to a lock, such as a Mutex, or Semaphore and SemaphoreSlim if you don't want the lock to have thread-affinity, i.e. you can release it on a different thread than the one that acquired it, or using a ManualResetEvent or ManualResetEventSlim instead of stop in which case you can make Work() sleep with a timeout while waiting for a stop signal before the next iteration, etc.


3.

One significant difference of .NET's volatile synchronization compared to Java's volatile synchronization is that Java requires you to use the same volatile location, whereas .NET only requires that an acquire (volatile read) happens after a release (volatile write). So, in principle you can synchronize in .NET with the following code, but you can't synchronize with the equivalent code in Java:

using System;
using System.Threading;

public class SurrealVolatileSynchronizer
{
    public volatile bool v1 = false;
    public volatile bool v2 = false;
    public int state = 0;

    public void DoWork1(object b)
    {
        var barrier = (Barrier)b;
        barrier.SignalAndWait();
        Thread.Sleep(100);
        state = 1;
        v1 = true;
    }

    public void DoWork2(object b)
    {
        var barrier = (Barrier)b;
        barrier.SignalAndWait();
        Thread.Sleep(200);
        bool currentV2 = v2;
        Console.WriteLine("{0}", state);
    }

    public static void Main(string[] args)
    {
        var synchronizer = new SurrealVolatileSynchronizer();
        var thread1 = new Thread(synchronizer.DoWork1);
        var thread2 = new Thread(synchronizer.DoWork2);
        var barrier = new Barrier(3);
        thread1.Start(barrier);
        thread2.Start(barrier);
        barrier.SignalAndWait();
        thread1.Join();
        thread2.Join();
    }
}

This surreal example expects threads and Thread.Sleep(int) to take an exact amount of time. If this is so, it synchronizes correctly, because DoWork2 performs a volatile read (acquire) after DoWork1 performs a volatile write (release).

In Java, even with such surreal expectations fulfilled, this would not guarantee synchronization. In DoWork2, you'd have to read from the same volatile field you wrote to in DoWork1.

这篇关于C#易失性变量:内存隔离VS。快取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆