MSVC中微基准测试的优化障碍:告诉优化器您破坏了内存? [英] Optimization barrier for microbenchmarks in MSVC: tell the optimizer you clobber memory?

查看:129
本文介绍了MSVC中微基准测试的优化障碍:告诉优化器您破坏了内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Chandler Carruth在他的 CppCon2015对话中介绍了两个功能优化程序的细粒度抑制.它们对于编写微基准很有用,这些微基准使优化程序不会简单地陷入毫无意义的境地.

Chandler Carruth introduced two functions in his CppCon2015 talk that can be used to do some fine-grained inhibition of the optimizer. They are useful to write micro-benchmarks that the optimizer won't simply nuke into meaninglessness.

void clobber() {
  asm volatile("" : : : "memory");
}

void escape(void* p) {
  asm volatile("" : : "g"(p) : "memory");
}    

这些使用内联汇编语句来更改优化器的假设.

These use inline assembly statements to change the assumptions of the optimizer.

clobber中的汇编语句声明其中的汇编代码可以在内存中的任何位置读取和写入.实际的汇编代码为空,但是优化器不会对其进行检查,因为它是asm volatile.当我们告诉它代码可以在内存中的任何地方读写时,它都会相信它.这样可以有效地防止优化程序在调用clobber之前重新排序或丢弃内存写入,并在调用clobber†之后强制执行内存读取.

The assembly statement in clobber states that the assembly code in it can read and write anywhere in memory. The actual assembly code is empty, but the optimizer won't look into it because it's asm volatile. It believes it when we tell it the code might read and write everywhere in memory. This effectively prevents the optimizer from reordering or discarding memory writes prior to the call to clobber, and forces memory reads after the call to clobber†.

escape中的一个,另外使指针p对汇编块可见.同样,由于优化器不会查看实际的内联汇编代码,因此该代码可以为空,并且优化器仍将假定该块使用指针p指向的地址.这有效地迫使p所指向的所有内容都在内存中而不是不在寄存器中,因为汇编块可能会从该地址执行读取.

The one in escape, additionally makes the pointer p visible to the assembly block. Again, because the optimizer won't look into the actual inline assembly code that code can be empty, and the optimizer will still assume that the block uses the address pointed by the pointer p. This effectively forces whatever p points to be in memory and not not in a register, because the assembly block might perform a read from that address.

(这很重要,因为clobber函数不会强制读取或写入编译器决定放入寄存器的任何内容,因为clobber中的汇编语句并未声明必须进行任何特别的处理对装配体可见.)

(This is important because the clobber function won't force reads nor writes for anything that the compilers decides to put in a register, since the assembly statement in clobber doesn't state that anything in particular must be visible to the assembly.)

所有这些事情都是在没有这些障碍"直接生成任何其他代码的情况下发生的.它们纯粹是编译时的工件.

All of this happens without any additional code being generated directly by these "barriers". They are purely compile-time artifacts.

但是,这些使用GCC和Clang支持的语言扩展.使用MSVC时是否有办法具有类似的行为?

These use language extensions supported in GCC and in Clang, though. Is there a way to have similar behaviour when using MSVC?

†要了解优化器为何要这样思考,请想象一下汇编块是否是一个向内存中的每个字节加1的循环.

† To understand why the optimizer has to think this way, imagine if the assembly block were a loop adding 1 to every byte in memory.

推荐答案

给出您近似为escape() ,您与以下近似的clobber()也应该很好(请注意,这是一个初稿,将某些解决方案推迟到函数nextLocationToClobber()的实现中进行):

Given your approximation of escape(), you should also be fine with the following approximation of clobber() (note that this is a draft idea, deferring some of the solution to the implementation of the function nextLocationToClobber()):

// always returns false, but in an undeducible way
bool isClobberingEnabled();

// The challenge is to implement this function in a way,
// that will make even the smartest optimizer believe that
// it can deliver a valid pointer pointing anywhere in the heap,
// stack or the static memory.
volatile char* nextLocationToClobber();

const bool clobberingIsEnabled = isClobberingEnabled();
volatile char* clobberingPtr;

inline void clobber() {
    if ( clobberingIsEnabled ) {
        // This will never be executed, but the compiler
        // cannot know about it.
        clobberingPtr = nextLocationToClobber();
        *clobberingPtr = *clobberingPtr;
    }
}


更新

问题:您如何确保isClobberingEnabled以不可推论的方式"返回false?当然,将定义放置在另一个翻译单元中将是微不足道的,但是启用LTCG的那一刻,该策略就失败了.你在想什么?

Question: How would you ensure that isClobberingEnabled returns false "in an undeducible way"? Certainly it would be trivial to place the definition in another translation unit, but the minute you enable LTCG, that strategy is defeated. What did you have in mind?

答案:我们可以利用数论中难以证明的属性,例如

Answer: We can take advantage of a hard-to-prove property from the number theory, for example, Fermat's Last Theorem:

bool undeducible_false() {
    // It took mathematicians more than 3 centuries to prove Fermat's
    // last theorem in its most general form. Hardly that knowledge
    // has been put into compilers (or the compiler will try hard
    // enough to check all one million possible combinations below).

    // Caveat: avoid integer overflow (Fermat's theorem
    //         doesn't hold for modulo arithmetic)
    std::uint32_t a = std::clock() % 100 + 1;
    std::uint32_t b = std::rand() % 100 + 1;
    std::uint32_t c = reinterpret_cast<std::uintptr_t>(&a) % 100 + 1;

    return a*a*a + b*b*b == c*c*c;
}

这篇关于MSVC中微基准测试的优化障碍:告诉优化器您破坏了内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆