编译器障碍的目的是什么? [英] what's the purpose of compiler barrier?

查看:92
本文介绍了编译器障碍的目的是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下摘录自在Windows上并发编程,第10章528〜529页,一个c ++模板双重检查实现

The following is excerpted from Concurrent Programming on windows, Chapter 10 Page 528~529, a c++ template Double check implementation

T getValue(){
    if (!m_pValue){
        EnterCriticalSection(&m_crst);
        if (! m_pValue){
            T pValue = m_pFactory();
            _WriteBarrier();
            m_pValue = pValue;                  
        }
        LeaveCriticalSection(&m_crst);
    }
      _ReadBarrier();
  return m_pValue;
}

作者状态:

在实例化对象之后但之前找到_WriteBarrier 在m_pValue字段中写入指向它的指针.这是必需的 确保对象初始化中的写操作永远不会得到 延迟了对m_pValue本身的写操作.

A _WriteBarrier is found after instantiating the object, but before writing a pointer to it in the m_pValue field. That's required to ensure that writes in the initialization of the object never get delayed past the write to m_pValue itself.

由于_WriteBarrier是编译屏障,所以我认为如果编译器知道LeaveCriticalSection的语义是没有用的.编译可能会省略对pValue的写入,但绝不会进行优化,以免在函数调用之前移动分配,否则会违反程序的语义.我相信LeaveCriticalSection具有隐式的硬件围栏.因此,在分配给m_pValue之前的所有写入操作都将被同步.

Since _WriteBarrier is compile barrier, I don't think it is useful if compiles know the semantics of LeaveCriticalSection. Compiles probably omit writing to pValue, but never optimize such that moving assignment before the function call, otherwise it would violate the program semantics. I believe LeaveCriticalSection has implicit hardware fence. And hence any writing before assignment to m_pValue will be synchronized.

另一方面,如果编译器不了解LeaveCriticalSection的语义,则所有平台中都需要_WriteBarrier,以防止编译器将分配移出关键部分.

On the other hand, if compiles don't know the semantics of LeaveCriticalSection, the _WriteBarrier will be needed in all platform to prevent compiles from moving assignment out of critical section.

对于_ReadBarrier,作者说

And for _ReadBarrier, the author said

类似地,在返回m_value之前,我们需要一个_ReadBarrier,因此 调用getValue之后加载的负载不会重新排序 在通话之前.

Similarly, we need a _ReadBarrier just before returning m_value so that loads after the call to getValue are not reordered to occur before the call.

首先, 如果该函数包含在库中,并且没有可用的源代码,那么编译器如何知道是否存在编译障碍?

First, if this function is included in a library, and no source code available, how do compiles know whether there is a compile barrier or not?

第二,如果需要将其放置在错误的位置,我想我们需要将它放置在EnterCriticalSection之后以表示获取围栏.与我上面写的相似,它取决于编译是否理解EnterCriticalSection的语义.

Second, it would be placed the wrong location if it is needed, I think we need place it right after EnterCriticalSection to express acquire fence. Similar with what i wrote above, it depends on whether compile understand EnterCriticalSection's semantics or not.

作者还说:

但是,我还要指出,X86不需要栅栏, Intel64和AMD64处理器. 不幸的是,处理器较弱 像IA64一样浑水

However, I will also point out that neither fence is required on X86, Intel64, and AMD64 processors. It's unfortunate that weak processors like IA64 have muddied the waters

正如我在上面的分析,如果我们在某些平台上需要这些障碍,那么我们在所有平台上都需要它们,因为这些障碍是编译障碍,所以只要确保编译可以进行正确的优化即可,以防万一.不能理解某些功能的语义.

As I analysis above, if we need those barriers in certain platform, then we need them in all platform, because those barriers are compile barriers, it just make sure that compile can do the correct optimization, in case if they don't understand the semantics of some functions.

如果我错了,请纠正我.

Please correct me if I am wrong.

另一个问题,是否有msvc和gcc的参考文献指出它们了解同步语义的功能?

Another question, is there any reference for msvc and gcc to point out which functions they understand their sync semantics?

更新1 : 根据答案(将在关键部分之外访问m_pValue),然后运行在这里:

Update 1: According to the answer(m_pValue will be accessed out of critical section), and run the sample codes from here, I think:

  1. 我认为作者在这里的意思是编译屏障以外的硬件围栏,请参阅以下此处,使用cpu栅栏将不会看到任何重新排序,反之亦然)
  1. I think what the author mean here is the hardware fence other than compile barrier, see following quote from MSDN.
  2. I believe hardware fence also has implicit compile barrier(disable compile optimization), but not vice versa(see here,using cpu fence will not see any reorder,but not vice versa)

隔离墙不是隔离墙.应注意,隔离墙效果 缓存中的所有内容.栅栏会影响单个缓存行.

A Barrier is not a fence.. It should be noted that a Barrier effects everything in cache. A fence effects a single cache line.

除非绝对必要,否则您不应添加障碍.使用 栅栏,您可以选择_Interlocked内部函数之一.

You should not be adding barriers unless absolutely necessary. To use a fence, you can select one of the _Interlocked intrinsic functions.

正如作者所写道:" X86 Intel64和AMD64处理器都不需要栅栏",这是因为这些平台只允许存储加载重新排序.

As author wrote: "neither fence is required on X86 Intel64, and AMD64 processors", this is because those platforms just allow store-load reorder.

仍然存在一个问题,编译器是否理解Enter/Leave关键部分的调用的语义?如果没有,则可能会按照以下答案进行优化,这将导致不良行为.

There still remain a question, Does compiles understand the semantics of call to Enter/Leave critical section? if it doesn't, then it may doing optimization as in the follow answer, that will cause bad behavior.

谢谢

推荐答案

tl; dr:
分配给m_pValue后,工厂调用很可能会采取几个步骤. !m_pValue表达式在工厂调用完成之前将返回false,从而在第二个线程中给出不完整的返回值.

tl;dr:
The factory call could well take several steps that may be moved after the assignment to m_pValue. The expression !m_pValue would return false before the factory call is complete, giving an incomplete return value in the second thread.

说明:

编译器可能会省略对pValue的写入,但绝不会在函数调用之前进行优化以使分配移动,否则会违反程序的语义.

Compiles probably omit writing to pValue, but never optimize such that moving assignment before the function call, otherwise it would violate the program semantics.

不一定.假设T为int*,并且factory方法创建一个新的int并将其初始化为42.

Not necessarily. Consider T to be int*, and the factory method creates a new int and initializes it with 42.

int* pValue = new int(42);
m_pValue = pValue;         
//m_pValue now points to anewly allocated int with value 42.

对于编译器,new表达式将是可以移到另一个步骤的几个步骤.它的语义是分配,初始化,然后将地址分配给pValue:

For the compiler the new expression would be several steps that could be moved before another. It's semantics are allocation, initialization, and then assignment of the address to pValue:

int* pTmp = new int;
*pTmp = 42;
int* pValue = *pTmp;

在顺序程序中,如果某些命令在其他命令之后移动,则语义不会改变.尤其是赋值可以在内存分配和第一次访问(即对一个指针的第一次取消引用)之间自由移动,包括在新表达式之后分配指针值之后:

In a sequential program, the semantics would not change if some of the commands are moved after others. Especially the assignment can be moved freely between the memory allocation and the first access, i.e. the first dereferencing of one of the pointers, including after the assignment of the pointer values after the new expression:

int* pTmp = new int;
int* pValue = *pTmp;
m_pValue = pValue;  
*pTmp = 42;
//m_pValue now points to a newly allocated int with value 42.

编译器可能会这样做以优化掉大部分临时指针:

The compiler will probably do that to optimize most of the temporary pointers away:

m_pValue = new int;  
*m_pValue = 42;
//m_pValue now points to a newly allocated int with value 42.

这是顺序程序的正确语义.

我相信LeaveCriticalSection具有隐式的硬件围栏.因此,在分配给m_pValue之前的所有写入操作都将被同步.

I believe LeaveCriticalSection has implicit hardware fence. And hence any writing before assignment to m_pValue will be synchronized.

不.篱笆位于对m_pValue的赋值之后,但是编译器仍然可以在整数和篱笆之间移动整数赋值:

No. The fence is after the assignment to m_pValue, but the compiler can still move the integer assignment between that and the fence:

m_pValue = new int;  
*m_pValue = 42;
LeaveCriticalSection();

这太晚了,因为Thread2不需要输入CriticalSection:

And that's too late, because Thread2 does not need to enter the CriticalSection:

Thread 1:                | Thread 2:
                         |
m_pValue = new int;      | 
                         | if (!m_pValue){     //already false
                         | }
                         | return m_pValue;
                         | /*use *m_pValue */
*m_pValue = 42;          |
LeaveCriticalSection();  |

这篇关于编译器障碍的目的是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆