当使用volatile,以抵消在C#编译器优化 [英] When to use volatile to counteract compiler optimizations in C#

查看:204
本文介绍了当使用volatile,以抵消在C#编译器优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我花了一个广泛的周数做多线程编码在C#4.0。然而,有一个问题是仍然没有答案对我来说。

据我所知,volatile关键字prevents编译器在寄存器中存储的变量,从而避免无意中阅读过时的价值观。写操作总是波动在.NET中,所以任何文件说,它也避免stales写是多余的。

我也知道,编译器优化有点非predictable。下面code将说明摆摊由于编译器优化(运行的版本编译VS之外时):

 类测试
{
    公共结构数据
    {
        公众诠释_loop;
    }

    公共静态数据的数据;

    公共静态无效的主要()
    {
        data._loop = 1;
        测试测试1 =新的测试();

        新的Thread(()=>
        {
            data._loop = 0;
        }
        )。开始();

        做
        {
            如果(data._loop!= 1)
            {
                打破;
            }

            //Thread.Yield();
        }而(真);

        //将永远不会终止
    }
}
 

在code行为与预期相同。不过,如果我去掉了//Thread.Yield();行,那么循环将退出。

另外,如果我把睡眠语句做循环之前,它会退出。我不明白这一点。

当然,装饰_loop用挥发性也会导致循环退出(其示出模式)。

我的问题是:什么是编译如下,以确定规则何时执行含蓄的挥发性读?为什么我还可以得到循环退出与我认为是古怪的措施?

修改

IL为code,如图(摊点):

  L_0038:ldsflda值类型ConsoleApplication1.Test /数据ConsoleApplication1.Test ::数据
L_003d:ldfld INT32 ConsoleApplication1.Test /数据:: _环
L_0042:ldc.i4.1
L_0043:beq.s L_0038
L_0045:RET
 

IL与产量()(不熄火):

  L_0038:ldsflda值类型ConsoleApplication1.Test /数据ConsoleApplication1.Test ::数据
L_003d:ldfld INT32 ConsoleApplication1.Test /数据:: _环
L_0042:ldc.i4.1
L_0043:beq.s L_0046
L_0045:RET
L_0046:叫布尔[mscorlib程序] System.Threading.Thread ::收益率()
L_004b:流行
L_004c:br.s L_0038
 

解决方案
  

哪些编译如下,以确定何时该规则   含蓄执行挥发性读?

首先,它不只是围绕移动指令的编译器。三巨头演员发挥,导致指令重新排序是:

  • 在编译器(如C#或VB.NET)
  • 在运行时(如CLR或单声道)
  • 在硬件(如x86或ARM)

在硬件层面的规则是有点添油加醋,因为它们通常记录pretty的好。但是,在运行时和编译器级别还有内存模型规范,提供约束如何说明能得到重新排序,但它留给实施者来决定他们要如何积极优化code,以及如何密切,他们希望听话相对于内存模型的约束。

例如,在ECMA规范的命令行接口提供相当薄弱的保证。但微软决定加强在.NET Framework CLR的保证。除了少数博客文章我还没有看到在CLR遵守的规则很多正式文件。当然单,可能使用一组不同的,可能会或可能不会使其更接近的ECMA规范规则。当然,有可能在只要正式ECMA规范仍然被认为是改变在将来的版本的规则的一些自由

随着这一切说我有几点看法:

  • 与发布配置编译更容易造成指令重新排序。
  • 简单的方法是更可能有他们的指示重新排序。
  • 从一个循环,在循环外部内部吊装读是一种典型的重新排序的优化。
  

和为什么我仍然得到了循环退出与我认为是   奇怪的措施?

这是因为这些奇怪的措施正在做两件事情之一:

  • 在生成的内隐记忆障碍
  • 绕过编译器或运行时的执行某些优化功能

例如,如果一个方法中的code变得太复杂,可能prevent JIT编译器执行某些优化,重新排序的说明。你可以认为它的排序是怎么样复杂的方法还没有得到内联。

此外,像 Thread.Yield Thread.sleep代码创建隐记忆障碍。我已经开始了这样的机制列表这里。如果你把一个 Console.WriteLine 呼叫你的code这也将导致循环退出我敢打赌。我也看到了非终止循环的例子在不同版本的.NET Framework的行为不同。例如,我敢打赌,如果你跑了code在1.0将终止。

这就是为什么使用 Thread.sleep代码模拟线交织实际上可能掩盖了记忆障碍的问题。

更新:

通过你的一些评论看完之后,我觉得你可能会感到困惑,以什么 Thread.MemoryBarrier 实际上做的事情。它是什么做的是它创建了一个完整的围栏屏障。这是什么意思是什么呢?一个完整的围栏屏障是两个半围栏组成:一个获取栅栏和释放栅栏。我现在定义它们。

  • 获取栅栏:在其他读取和放A内存屏障;写不准动的 的围栏。
  • 发布栏:在其他读取和放A内存屏障;写不准动的 的围栏。

所以,当你看到一个调用 Thread.MemoryBarrier 将prevent的所有的读取和放大器;被移动的上方或下方的阻挡写入。它也将发出任何CPU的具体说明是必需的。

如果你看一下$ C $下 Thread.VolatileRead 这里是你会看到的。

 公共静态INT VolatileRead(REF INT地址)
{
    INT NUM =地址;
    MemoryBarrier();
    返回NUM;
}
 

现在你可能会奇怪,为什么 MemoryBarrier 调用的的实际读取。你的直觉会告诉你,得到一个鲜读地址您需要调用 MemoryBarrier 的发生的的读取。但是,唉,你的直觉是错的!该规范说,挥发性阅读应该产生一个获取围栏屏障。并根据该定义,我给你在上面,这意味着在调用 MemoryBarrier 已成为的的读地址来prevent其他读取和被移动写的的吧。你看挥发性读取不是严格意义上得到一个鲜看。这是约preventing指令的运动。这是令人难以置信的混乱;我知道了。

I have spent an extensive number of weeks doing multithreaded coding in C# 4.0. However, there is one question that remains unanswered for me.

I understand that the volatile keyword prevents the compiler from storing variables in registers, thus avoiding inadvertently reading stale values. Writes are always volatile in .Net, so any documentation stating that it also avoids stales writes is redundant.

I also know that the compiler optimization is somewhat "unpredictable". The following code will illustrate a stall due to a compiler optimization (when running the release compile outside of VS):

class Test
{
    public struct Data
    {
        public int _loop;
    }

    public static Data data;

    public static void Main()
    {
        data._loop = 1;
        Test test1 = new Test();

        new Thread(() =>
        {
            data._loop = 0;
        }
        ).Start();

        do
        {
            if (data._loop != 1)
            {
                break;
            }

            //Thread.Yield();
        } while (true);

        // will never terminate
    }
}

The code behaves as expected. However, if I uncomment out the //Thread.Yield(); line, then the loop will exit.

Further, if I put a Sleep statement before the do loop, it will exit. I don't get it.

Naturally, decorating _loop with volatile will also cause the loop to exit (in its shown pattern).

My question is: What are the rules the complier follows in order to determine when to implicity perform a volatile read? And why can I still get the loop to exit with what I consider to be odd measures?

EDIT

IL for code as shown (stalls):

L_0038: ldsflda valuetype ConsoleApplication1.Test/Data ConsoleApplication1.Test::data
L_003d: ldfld int32 ConsoleApplication1.Test/Data::_loop
L_0042: ldc.i4.1 
L_0043: beq.s L_0038
L_0045: ret 

IL with Yield() (does not stall):

L_0038: ldsflda valuetype ConsoleApplication1.Test/Data ConsoleApplication1.Test::data
L_003d: ldfld int32 ConsoleApplication1.Test/Data::_loop
L_0042: ldc.i4.1 
L_0043: beq.s L_0046
L_0045: ret 
L_0046: call bool [mscorlib]System.Threading.Thread::Yield()
L_004b: pop 
L_004c: br.s L_0038

解决方案

What are the rules the complier follows in order to determine when to implicity perform a volatile read?

First, it is not just the compiler that moves instructions around. The big 3 actors in play that cause instruction reordering are:

  • Compiler (like C# or VB.NET)
  • Runtime (like the CLR or Mono)
  • Hardware (like x86 or ARM)

The rules at the hardware level are a little more cut and dry in that they are usually documented pretty well. But, at the runtime and compiler levels there are memory model specifications that provide constraints on how instructions can get reordered, but it is left up to the implementers to decide how aggressively they want to optimize the code and how closely they want to toe the line with respect to the memory model constraints.

For example, the ECMA specification for the CLI provides fairly weak guarantees. But Microsoft decided to tighten those guarantees in the .NET Framework CLR. Other than a few blog posts I have not seen much formal documentation on the rules the CLR adheres to. Mono, of course, might use a different set of rules that may or may not bring it closer to the ECMA specification. And of course, there may be some liberty in changing the rules in future releases as long as the formal ECMA specification is still considered.

With all of that said I have a few observations:

  • Compiling with the Release configuration is more likely to cause instruction reordering.
  • Simpler methods are more likely to have their instructions reordered.
  • Hoisting a read from inside a loop to outside of the loop is a typical type of reordering optimization.

And why can I still get the loop to exit with what I consider to be odd measures?

It is because those "odd measures" are doing one of two things:

  • generating an implicit memory barrier
  • circumventing the compiler's or runtime's ability to perform certain optimizations

For example, if the code inside a method gets too complex it may prevent the JIT compiler from performing certain optimizations that reorders instructions. You can think of it as sort of like how complex methods also do not get inlined.

Also, things like Thread.Yield and Thread.Sleep create implicit memory barriers. I have started a list of such mechanisms here. I bet if you put a Console.WriteLine call in your code it would also cause the loop to exit. I have also seen the "non terminating loop" example behave differently in different versions of the .NET Framework. For example, I bet if you ran that code in 1.0 it would terminate.

This is why using Thread.Sleep to simulate thread interleaving could actually mask a memory barrier problem.

Update:

After reading through some of your comments I think you may be confused as to what Thread.MemoryBarrier is actually doing. What it is does is it creates a full-fence barrier. What does that mean exactly? A full-fence barrier is the composition of two half-fences: an acquire-fence and a release-fence. I will define them now.

  • Acquire fence: A memory barrier in which other reads & writes are not allowed to move before the fence.
  • Release fence: A memory barrier in which other reads & writes are not allowed to move after the fence.

So when you see a call to Thread.MemoryBarrier it will prevent all reads & writes from being moved either above or below the barrier. It will also emit whatever CPU specific instructions are required.

If you look at the code for Thread.VolatileRead here is what you will see.

public static int VolatileRead(ref int address)
{
    int num = address;
    MemoryBarrier();
    return num;
}

Now you may be wondering why the MemoryBarrier call is after the actual read. Your intuition may tell you that to get a "fresh" read of address you would need the call to MemoryBarrier to occur before that read. But, alas, your intuition is wrong! The specification says a volatile read should produce an acquire-fence barrier. And per the definition I gave you above that means the call to MemoryBarrier has to be after the read of address to prevent other reads and writes from being moved before it. You see volatile reads are not strictly about getting a "fresh" read. It is about preventing the movement of instructions. This is incredibly confusing; I know.

这篇关于当使用volatile,以抵消在C#编译器优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆