为什么volatile变量的局部变量优化与volatile变量不同,为什么优化器会从后者生成一个无操作循环? [英] Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?

查看:145
本文介绍了为什么volatile变量的局部变量优化与volatile变量不同,为什么优化器会从后者生成一个无操作循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景



这个问题是由这个问题/答案和随后在评论中的讨论所启发的:volatile的定义是否易变,或者GCC是否存在一些标准符合性问题?。根据别人的意见以及我对发生的事情的解释,正如评论中所讨论的,我已将它提交给GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71793 其他相关回复仍然受欢迎。

此外,该线程自此引发了这个问题:是否通过易失性引用/指针访问已声明的非易失性对象会产生易失规则在所述访问中?



简介



我知道 volatile 不是大多数人认为的那样,是一个实现定义的毒蛇之巢。我当然不想在任何实际的代码中使用下面的结构。也就是说,我对这些例子中发生的事情感到十分困惑,所以我非常感谢任何澄清。



我猜这是由于高度细微差别对标准的解释或(更可能的)仅仅是所使用的优化器的角落案例。无论哪种方式,虽然比实际更具学术性,但我希望这被认为是有价值的分析,尤其是考虑到如何误解 volatile 是。一些更多的数据点 - 或许更可能的,反对它的点 - 必须是好的。

输入



此代码:

  #include  
void f(void * const p,std :: size_t n)
{
unsigned char * y = static_cast< unsigned char *>(p);
volatile unsigned char const x = 42;
// N.B.是的,const是奇怪的,但它不会改变任何

,而(n--){
* y ++ = x;



void g(void * const p,std :: size_t n,volatile unsigned char const x)
{
unsigned char * y = static_cast< unsigned char *>(p);

while(n--){
* y ++ = x;



void h(void * const p,std :: size_t n,volatile unsigned char const& x)
{
unsigned char * y = static_cast< unsigned char *>(p);

while(n--){
* y ++ = x;



int main(int,char **)
{
int y [1000];
f(& y,sizeof y);
volatile unsigned char const x {99};
g(& y,sizeof y,x);
h(& y,sizeof y,x);



$ h $输出

g ++ 来自 gcc(Debian 4.9.2-10)4.9.2 (Debian stable aka Jessie)用命令行 g ++ -std = c ++ 14 -O3 -S test.cpp 产生<$ c $的以下ASM C> main()的。版本 Debian 5.4.0-6 (当前 unstable )会生成等效的代码,但我只是碰巧运行较旧的代码首先,这里是:

  main:
.LFB3:
.cfi_startproc

#f()
movb $ 42,-1(%rsp)
movl $ 4000,%eax
.p2align 4,,10
.p2align 3
.L21:
subq $ 1,%rax
movzbl -1(%rsp),%edx
jne .L21

#x = 99
movb $ 99,-2(%rsp)
movzbl -2(%rsp),%eax

#g()
movl $ 4000,%eax
.p2align 4,,10
.p2align 3
.L22:
subq $ 1,%rax
jne .L22

#h()
movl $ 4000,%eax
.p2align 4,,10
.p2align 3
.L23:
subq $ 1,%rax
movzbl -2(%rsp), %edx
jne .L23

#return 0;
xorl%eax,%eax
ret
.cfi_endproc



分析



所有3个函数都是内联的,并且分配 volatile 局部变量都是在栈上进行的,原因。但这只是他们分享的唯一... ...



$ ul
  • f() 确保在每次迭代时从 x 读取,大概是由于它的 volatile - 但只是将结果转储到 edx ,大概是因为目的地 y 未被声明为 volatile 并且永远不会被读取,这意味着可以在 as-if 规则下对其进行修改。好吧,有道理。




    • 嗯,我的意思是... kinda 。就像,不是真的,因为 volatile 实际上是用于硬件寄存器的,显然一个本地值不能是其中的一个 - 并且不能在 volatile 方式,除非它的地址被传出......它不是。看, volatile 本地值没有太多意义。但是C ++让我们声明它们,并试图用它们来做 。因此,一如既往地困惑,我们偶然发现。
    • / code> :什么通过将 volatile 源移动到传值参数中,只是另一个局部变量,GCC以某种方式决定它不是或 volatile ,所以它不需要每次迭代都读取它......但它仍然执行该循环,尽管它的主体现在没有做任何事情。
      :通过传递 volatile 作为传递引用,与 f ()会被恢复,所以循环会执行 volatile 读取操作。


      • 由于上面针对 f()所述的原因,这种情况对我来说实际上具有实际意义。详细说明:假设 x 指的是一个硬件寄存器,其中每个读取都有副作用。




    添加 #define volatile / ** / 会导致 main()无法运行,正如您所期望的那样。所以,即使在一个局部变量 volatile 出现的时候,也会做某些事情......我只是不知道 克()。这里有什么?



    问题




    • 为什么本地在体内声明的值会产生与按值参数不同的结果,前者允许将读数优化掉?两者都被声明为 volatile 。没有地址传出 - 没有静态地址,排除了任何内联ASM POKE ry - 所以他们永远不能被修改功能。编译器可以看到每一个都是常量,不需要重新读取,而且 volatile 只是不正确的 -


      • 所以(A)或者允许在这样的约束下被消除? (作为 as-if ,他们没有被宣布为 volatile ) -

      • 和(B)为什么只有一个得到了解决?是易变局部变量更多易变比其他


    • 暂时忽略该不一致性:在读取优化之后,为什么编译器仍然会生成循环? 为什么优化器没有将其作为 没有编码循环?


    由于优化分析等顺序,这是一个奇怪的角落案例吗?由于代码是一个愚蠢的思想实验,我不会为此对GCC进行惩罚,但确实知道这很好。 (或者是 g()这些年来人们梦寐以求的手动时序循环?)如果我们得出结论,没有任何标准的承诺,我会把它移到他们的Bugzilla仅供参考。



    当然,从实际角度来看更重要的问题,尽管我不希望这会掩盖编译器的潜在可能性。 ..其中,如果有任何这些,根据标准定义良好/正确?

    解决方案

    对于f:GCC消除非易失性存储器(但不包括负载,如果源位置是内存映射硬件寄存器,负载可能有副作用)。这里真的没有什么奇怪的。



    对于g:由于 x86_64 ABI g 的参数 x 被分配到寄存器中即 rdx ),并且在内存中没有位置。阅读通用寄存器没有任何可观察到的副作用,所以死读取被消除。

    Background

    This was inspired by this question/answer and ensuing discussion in the comments: Is the definition of "volatile" this volatile, or is GCC having some standard compliancy problems?. Based on others' and my interpretation of what should happening, as discussed in comments, I've submitted it to GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71793 Other relevant responses are still welcome.

    Also, that thread has since given rise to this question: Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?

    Intro

    I know volatile isn't what most people think it is and is an implementation-defined nest of vipers. And I certainly don't want to use the below constructs in any real code. That said, I'm totally baffled by what's going on in these examples, so I'd really appreciate any elucidation.

    My guess is this is due to either highly nuanced interpretation of the Standard or (more likely?) just corner-cases for the optimiser used. Either way, while more academic than practical, I hope this is deemed valuable to analyse, especially given how typically misunderstood volatile is. Some more data points - or perhaps more likely, points against it - must be good.

    Input

    Given this code:

    #include <cstddef>
    
    void f(void *const p, std::size_t n)
    {
        unsigned char *y = static_cast<unsigned char *>(p);
        volatile unsigned char const x = 42;
        // N.B. Yeah, const is weird, but it doesn't change anything
    
        while (n--) {
            *y++ = x;
        }
    }
    
    void g(void *const p, std::size_t n, volatile unsigned char const x)
    {
        unsigned char *y = static_cast<unsigned char *>(p);
    
        while (n--) {
            *y++ = x;
        }
    }
    
    void h(void *const p, std::size_t n, volatile unsigned char const &x)
    {
        unsigned char *y = static_cast<unsigned char *>(p);
    
        while (n--) {
            *y++ = x;
        }
    }
    
    int main(int, char **)
    {
        int y[1000];
        f(&y, sizeof y);
        volatile unsigned char const x{99};
        g(&y, sizeof y, x);
        h(&y, sizeof y, x);
    }
    

    Output

    g++ from gcc (Debian 4.9.2-10) 4.9.2 (Debian stable a.k.a. Jessie) with the command line g++ -std=c++14 -O3 -S test.cpp produces the below ASM for main(). Version Debian 5.4.0-6 (current unstable) produces equivalent code, but I just happened to run the older one first, so here it is:

    main:
    .LFB3:
        .cfi_startproc
    
    # f()
        movb    $42, -1(%rsp)
        movl    $4000, %eax
        .p2align 4,,10
        .p2align 3
    .L21:
        subq    $1, %rax
        movzbl  -1(%rsp), %edx
        jne .L21
    
    # x = 99
        movb    $99, -2(%rsp)
        movzbl  -2(%rsp), %eax
    
    # g()
        movl    $4000, %eax
        .p2align 4,,10
        .p2align 3
    .L22:
        subq    $1, %rax
        jne .L22
    
    # h()
        movl    $4000, %eax
        .p2align 4,,10
        .p2align 3
    .L23:
        subq    $1, %rax
        movzbl  -2(%rsp), %edx
        jne .L23
    
    # return 0;
        xorl    %eax, %eax
        ret
        .cfi_endproc
    

    Analysis

    All 3 functions are inlined, and both that allocate volatile local variables do so on the stack for fairly obvious reasons. But that's about the only thing they share...

    • f() ensures to read from x on each iteration, presumably due to its volatile - but just dumps the result to edx, presumably because the destination y isn't declared volatile and is never read, meaning changes to it can be nixed under the as-if rule. OK, makes sense.

      • Well, I mean... kinda. Like, not really, because volatile is really for hardware registers, and clearly a local value can't be one of those - and can't otherwise be modified in a volatile way unless its address is passed out... which it's not. Look, there's just not a lot of sense to be had out of volatile local values. But C++ lets us declare them and tries to do something with them. And so, confused as always, we stumble onwards.
    • g(): What. By moving the volatile source into a pass-by-value parameter, which is still just another local variable, GCC somehow decides it's not or less volatile, and so it doesn't need to read it every iteration... but it still carries out the loop, despite its body now doing nothing.

    • h(): By taking the passed volatile as pass-by-reference, the same effective behaviour as f() is restored, so the loop does volatile reads.

      • This case alone actually makes practical sense to me, for reasons outlined above against f(). To elaborate: Imagine x refers to a hardware register, of which every read has side-effects. You wouldn't want to skip any of those.

    Adding #define volatile /**/ leads to main() being a no-op, as you'd expect. So, when present, even on a local variable volatile does do something... I just have no idea what in the case of g(). What on Earth is going on there?

    Questions

    • Why does a local value declared in-body produce different results from a by-value parameter, with the former letting reads be optimised away? Both are declared volatile. Neither have an address passed out - and don't have a static address, ruling out any inline-ASM POKEry - so they can never be modified outwith the function. The compiler can see that each is constant, need never be re-read, and volatile just ain't true -
      • so (A) is either allowed to be elided under such constraints? (acting as-if they weren't declared volatile) -
      • and (B) why does only one get elided? Are some volatile local variables more volatile than others?
    • Setting aside that inconsistency for just a moment: After the read was optimised away, why does the compiler still generate the loop? It does nothing! Why doesn't the optimiser elide it as-if no loop was coded?

    Is this a weird corner case due to order of optimising analyses or such? As the code is a daft thought-experiment, I wouldn't chastise GCC for this, but it'd be good to know for sure. (Or is g() the manual timing loop people have dreamt of all these years?) If we conclude there's no Standard bearing on any of this, I'll move it to their Bugzilla just for their information.

    And of course, the more important question from a practical perspective, though I don't want that to overshadow the potential for compiler geekery... Which, if any of these, are well-defined/correct according to the Standard?

    解决方案

    For f: GCC eliminates the non-volatile stores (but not the loads, which can have side-effects if the source location is a memory mapped hardware register). There is really nothing surprising here.

    For g: Because of the x86_64 ABI the parameter x of g is allocated in a register (i.e. rdx) and does not have a location in memory. Reading a general purpose register does not have any observable side effects so the dead read gets eliminted.

    这篇关于为什么volatile变量的局部变量优化与volatile变量不同,为什么优化器会从后者生成一个无操作循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆