我什么时候可以自信地用 -O3 编译程序? [英] When can I confidently compile program with -O3?

查看:23
本文介绍了我什么时候可以自信地用 -O3 编译程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到很多人抱怨 -O3 选项:

我查看 GCC 的手册:

<块引用>

 -O3 优化更多.-O3 打开所有优化由 -O2 指定并打开-finline-functions 和 -frename-registers 选项.

而且我还确认了代码以确保两个选项是 -O3 上唯一包含的两个优化:

if (优化 >= 3){flag_inline_functions = 1;flag_rename_registers = 1;}

对于这两个优化:

  • -finline-functions 在某些情况下(主要是 C++)很有用,因为它允许我们使用 -finline-limit 定义内联函数的大小(默认为 600).设置高内联限制时,编译器可能会报告错误,抱怨内存不足.
  • -frename-registers 尝试通过使用寄存器分配后剩余的寄存器来避免调度代码中的错误依赖.这种优化最有利于拥有大量寄存器的处理器.

对于inline-functions,虽然可以减少函数调用次数,但是可能会导致二进制文件过大,所以-finline-functions可能会引入严重的缓存惩罚,甚至比-O2.我认为缓存惩罚不仅取决于程序本身.

对于重命名寄存器,我认为它不会对像 x86 这样的 cisc 架构产生任何积极影响.

我的问题有 2.5 个部分:

  1. 我是否正确地声称使用 -O3 选项可以使程序运行得更快取决于底层平台/架构?[已回答]

    第一部分已确认为真.David Hammen 还声称,我们应该非常小心优化和浮点运算如何在具有扩展精度浮点寄存器(如 Intel 和 AMD)的机器上进行交互.

  2. 我什么时候可以自信地使用 -O3 选项?我想这两个优化尤其是重命名寄存器可能会导致与 -O0/O2 不同的行为.我看到一些用 -O3 编译的程序在执行过程中崩溃了,这是确定性的吗?如果我运行一次可执行文件没有任何崩溃,是否意味着使用 -O3 是安全的?

    确定性与优化无关,它是一个多线程问题.但是,对于多线程程序,当我们运行一次可执行文件而没有错误时,使用 -O3 是不安全的.David Hammen 表明,-O3 对浮点运算的优化可能会违反用于比较的严格弱排序标准.当我们想使用 -O3 选项时,还有其他需要注意的问题吗?

  3. 如果第一个问题的答案是是",那么当我改变目标平台或在不同机器的分布式系统中时,我可能需要在-O3之间进行更改和 -O2.是否有任何通用方法可以决定我是否可以使用 -O3 获得性能改进?例如,更多的寄存器,短内联函数等.[已回答]

    第三部分已由 Louen 回答为平台的多样性使对这个问题的一般推理变得不可能";在评估 -O3 的性能提升时,我们必须同时尝试这两种方法,并对我们的代码进行基准测试,看看哪个更快.

解决方案

  1. 我看到一些程序在使用 -O3 编译时崩溃,这是确定性的吗?

如果程序是单线程的,程序使用的所有算法都是确定性的,如果从运行到运行的输入是相同的,是的.如果其中任何一个条件不成立,答案是不一定".

同样适用于不使用 -O3 编译的情况.

<块引用>

如果我运行一次可执行文件没有任何崩溃,是否意味着使用 -O3 是安全的?

当然不是.同样,如果您不使用 -O3 进行编译,则同样适用.仅仅因为您的应用程序运行一次并不意味着它在所有情况下都能成功运行.这就是测试成为难题的部分原因.


在浮点寄存器比双精度数更高的机器上,浮点运算会导致奇怪的行为.例如,

void add (double a, double b, double & result) {双温度 = a + b;结果 = 温度;如果(结果!= 温度){抛出 FunkyAdditionError (temp);}}

编译使用此 add 函数的程序未经优化,您可能永远不会看到任何 FunkyAdditionError 异常.编译优化,某些输入会突然开始导致这些异常.问题在于,通过优化,编译器会将 temp 作为寄存器,而作为引用的 result 不会被编译到寄存器中.添加一个 inline 限定符,当您的编译器使用 -O3 编译时,这些异常可能会消失,因为现在 result 也可以是一个寄存器.关于浮点运算的优化可能是一个棘手的主题.

最后,让我们看看当使用 -O3 编译程序时,事情确实在晚上发生颠簸的情况之一,GCC:程序不适用于编译选项 -O3.问题只发生在 -O3 上,因为编译器可能内联了 distance 函数,但将结果中的一个(但不是两个)保存在扩展精度浮点寄存器中.通过这种优化,某些点 p1p2 可以导致 p1p2 评估为.这违反了比较函数的严格弱排序标准.

您需要非常小心优化和浮点运算如何在具有扩展精度浮点寄存器的机器(例如,Intel 和 AMD)上进行交互.

I've seen a lot of people complaining about -O3 option:

I check the manual from the GCC:

   -O3    Optimize yet more.  -O3 turns on all optimizations
          specified   by   -O2   and   also   turns  on  the
          -finline-functions and -frename-registers options.

And I've also confirmed the code to make sure that two options is the only two optimizations included with -O3 on:

if (optimize >= 3){
    flag_inline_functions = 1;
    flag_rename_registers = 1;
}

For those two optimizations:

  • -finline-functions is useful in some cases (mainly with C++) because it lets us define the size of inlined functions (600 by default) with -finline-limit. Compiler may report an error complaining about lack of memory when set a high inline-limit.
  • -frename-registers attempts to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers.

For inline-functions, although it can reduce the numbers of function calls, but it may lead to a large binary files, so -finline-functions may introduce severe cache penalties and become even slower than -O2. I think the cache penalties not only depends on the program itself.

For rename-registers, I don't think it will have any positive impact on a cisc architecture like x86.

My question has 2.5 parts:

  1. Am I right to claim that whether a program can run faster with -O3 option depends on the underlying platform/architecture? [Answered]

    EDIT:

    The 1st part has been confirmed as true. David Hammen also claim that we should be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers like Intel and AMD.

  2. When can I confidently use -O3 option? I suppose these two optimizations especially the rename-registers may lead to a different behaviors from -O0/O2. I saw some programs compiled with -O3 got crashed during execution, is it deterministic? If I run an executable once without any crash, does it mean it is safe to use -O3?

    EDIT: The deterministicity has nothing to do with the optimization, it is a multithreading problem. However, for a multithread program, it is not safe to use -O3 when we run an executable once without errors. David Hammen shows that -O3 optimization on floating point operations may violate the strict weak ordering criterion for a comparison. Is there any other concern we need to take care when we want to use -O3 option?

  3. If the answer of the 1st question is "yes", then when I change the target platform or in a distributed system with different machines, I may need to change between -O3 and -O2. Is there any general ways to decide whether I can get a performance improvement with -O3? For example, more registers, short inline functions, etc. [Answered]

    EDIT: The 3rd part has been answered by Louen as "the variety of platforms make general reasoning about this problem impossible" When evaluating the performance gain by -O3, we have to try it with both and benchmark our code to see which is faster.

解决方案

  1. I saw some programs got crashed when compiling with -O3, is it deterministic?

If the program is single threaded, all algorithms used by program are deterministic, and if the inputs from run to run are identical, yes. The answer is "not necessarily" if any of those conditions is not true.

The same applies if you compile without using -O3.

If I run an executable once without any crash, does it mean it is safe to use -O3?

Of course not. Once again, the same applies if you compile without using -O3. Just because your application runs once does not mean it will run successfully in all cases. That's part of what makes testing a hard problem.


Floating point operations can result in weird behaviors on machines in which the floating point registers have greater precision than do doubles. For example,

void add (double a, double b, double & result) {
   double temp = a + b;
   result = temp;
   if (result != temp) {
      throw FunkyAdditionError (temp);
   }
}

Compile a program that uses this add function unoptimized and you probably will never see any FunkyAdditionError exceptions. Compile optimized and certain inputs will suddenly start resulting in these exceptions. The problem is that with optimization, the compiler will make temp a register while result, being a reference, won't be compiled away into a register. Add an inline qualifier and those exceptions may disappear when your compiler is compiled with -O3 because now result can also be a register. Optimization with regard to floating point operations can be a tricky subject.

Finally, let's look at one of those cases where things did go bump in the night when a program was compiled with -O3, GCC: program doesn't work with compilation option -O3. The problem only occurred with -O3 because the compiler probably inlined the distance function but kept one (but not both) of the results in an extended precision floating point register. With this optimization, certain points p1 and p2 can result in both p1<p2 and p2<p1 evaluating to true. This violates the strict weak ordering criterion for a comparison function.

You need to be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers (e.g., Intel and AMD).

这篇关于我什么时候可以自信地用 -O3 编译程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆