我不明白DoNotOptimizeAway的定义 [英] I don't understand the definition of DoNotOptimizeAway

查看:110
本文介绍了我不明白DoNotOptimizeAway的定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在检查 Celero git存储库 DoNotOptimizeAway的含义.但是我还是不明白.请您能帮我用外行的术语理解它吧.尽你所能.

I am checking on Celero git repository the meaning of DoNotOptimizeAway. But I still don't get it. Could you please help me understand it in layman's terms please. As much as you can.

提供了celero :: DoNotOptimizeAway模板,以确保 优化编译器不会消除您的功能或代码.自从 所有示例基准及其测试中都使用了此功能 基线,它的时间开销在比较中被抵消了.

The celero::DoNotOptimizeAway template is provided to ensure that the optimizing compiler does not eliminate your function or code. Since this feature is used in all of the sample benchmarks and their baseline, it's time overhead is canceled out in the comparisons.

推荐答案

您没有包括定义,仅包括文档.我认为您是在寻求帮助来了解为什么它甚至存在,而不是定义.

You haven't included the definition, just the documentation. I think you're asking for help understanding why it even exists, rather than the definition.

它使编译器无法进行CSE处理,也无需进行重复循环提升工作,因此您可以将同一工作重复进行足够的次数以进行测量.例如将简短的内容放入运行10亿次的循环中,然后您可以轻松地测量整个循环的时间(一秒钟左右).参见 x86的MOV是否真的可以免费"?我为什么不能完全复制此内容?以asm手工完成此操作为例.如果要这样的编译器生成的代码,则需要一个DoNotOptimizeAway之类的函数/宏.

It stops compilers from CSEing and hoisting work out of repeat-loops, so you can repeat the same work enough times to be measurable. e.g. put something short in a loop that runs 1 billion times, and then you can measure the time for the whole loop easily (a second or so). See Can x86's MOV really be "free"? Why can't I reproduce this at all? for an example of doing this by hand in asm. If you want compiler-generated code like that, you need a function / macro like DoNotOptimizeAway.

在禁用优化的情况下编译整个程序是没有用的:在C ++语句之间存储/重新加载所有内容会产生非常不同的瓶颈(通常是存储转发延迟).请参见添加冗余分配以加快编译时的代码速度没有优化

Compiling the whole program with optimization disabled would be useless: storing/reloading everything between C++ statements gives very different bottlenecks (usually store-forwarding latency). See Adding a redundant assignment speeds up code when compiled without optimization

另请参阅性能评估的惯用方式?了解一般的微基准测试陷阱

See also Idiomatic way of performance evaluation? for general microbenchmarking pitfalls

也许看一下实际的定义也会有所帮助.

Perhaps looking at the actual definition can also help.

此问题与解答(优化障碍有关MSVC中的微基准测试:告诉优化器您的缓冲内存?)描述了DoNotOptimize宏的一种实现方式的工作原理(并询问如何将其从GNU C ++移植到MSVC).

This Q&A (Optimization barrier for microbenchmarks in MSVC: tell the optimizer you clobber memory?) describes how one implementation of a DoNotOptimize macro works (and asks how to port it from GNU C++ to MSVC).

escape宏来自Chandler Carruth的CppCon2015演讲,调整C ++:基准,和CPU,以及编译器!噢,我的天!" .该演讲还详细介绍了编写微基准测试时为什么需要这样做:在启用优化的情况下进行编译时,阻止整个循环停止优化.

The escape macro is from Chandler Carruth's CppCon2015 talk, "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!". That talk also goes into detail about exactly why it's needed when writing microbenchmarks: to stop whole loops from optimizing away when you compile with optimization enabled.

((如果出现问题,让编译器将事情从循环中提升而不是反复地进行计算是很难解决的.如果函数__attribute__((noinline))足够大以至于不需要,则制作函数__attribute__((noinline))可能会有所帮助内联.检查编译器的asm输出,以查看它已提升了多少设置.)

(Having the compiler hoist things out of loops instead of compute them repeatedly is harder to get right if it's a problem. Making a function __attribute__((noinline)) can help if it's big enough that it didn't need to inline. Check the compiler's asm output to see how much setup it hoisted.)

BTW是GNU C/C ++的一个很好的定义,通常零附加成本为:
asm volatile("" :: "r"(my_var));编译为0条asm指令,但要求编译器在其选择的寄存器中具有my_var的值. (并且由于asm volatile,必须在C ++抽象机中运行"多次).

And BTW, a good definition for GNU C / C++ normally has zero extra cost:
asm volatile("" :: "r"(my_var)); compiles to zero asm instructions, but requires the compiler to have the value of my_var in a register of its choice. (And because of asm volatile, has to "run" that many times in the C++ abstract machine).

仅当编译器可以将一部分计算转换为其他内容时,这才会影响优化. (例如,在循环计数器上使用它会阻止编译器仅使用指针增量,并与终端指针进行比较以进行正确的for(i=0;i<n;i++) sum+=a[i];

This will only impact optimization if the compiler could have transformed the calculation it was part of into something else. (e.g. using this on a loop counter would stop the compiler from using just pointer-increments and compare against an end-pointer to do the right number of iterations of for(i=0;i<n;i++) sum+=a[i];

使用像asm volatile("" :"+r"(my_var));这样的读-修改-写操作数将迫使编译器忘记它知道的所有范围限制或常量传播信息,并将其视为传入函数arg.例如是42,还是非负数.这可能会对优化产生更大的影响.

Using a read-modify-write operand like asm volatile("" :"+r"(my_var)); would force the compiler to forget all range-restriction or constant-propagation info it knows about the value, and treat it like an incoming function arg. e.g. that it's 42, or that it's non-negative. This could impact optimization more.

当他们说比较中的开销被抵消"时,他们希望不是在谈论从单个计时结果中明确减去任何东西,而是在谈论自己对基准DoNotOptimizeAway进行基准测试.

When they say the "overhead is cancelled out in comparisons", they're hopefully not talking about explicitly subtracting anything from a single timing result, and not talking about benchmarking DoNotOptimizeAway on its own.

那是行不通的.通过累加每条指令的成本,现代CPU的性能分析不能正常工作.无序流水线执行意味着,如果前端(总指令吞吐量)不是瓶颈,而所需的执行单元也不是瓶颈,那么一条额外的asm指令就可以轻松实现零额外成本.

That wouldn't work. Performance analysis for modern CPUs does not work by adding up the costs of each instruction. Out-of-order pipelined execution means that an extra asm instruction can easily have zero extra cost if the front-end (total instruction throughput) wasn't the bottleneck, and if the execution unit it needs wasn't either.

如果它们的可移植定义类似于volatile T sink = input;,则仅当您的代码在要缓存的存储吞吐量上遇到瓶颈时,额外的asm存储才会产生成本.

If their portable definition is something like volatile T sink = input;, the extra asm store would only have a cost if your code bottlenecked on store throughput to cache.

关于取消的说法听起来有些乐观.正如我上面解释的,加上以上上下文/优化相关的因素. DoNotOptimizeAway)

So that claim about cancelling out sounds a bit optimistic. As I explained above, Plus the above context / optimization-dependent factors. It's possible that a DoNotOptimizeAway)

有关相同功能的相关问答:

Related Q&As about the same functions:

  • Preventing compiler optimizations while benchmarking
  • Avoid optimizing away variable with inline asm
  • "Escape" and "Clobber" equivalent in MSVC

这篇关于我不明白DoNotOptimizeAway的定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆