我什么时候可以自信地用-O3编译程序? [英] When can I confidently compile program with -O3?

查看:273
本文介绍了我什么时候可以自信地用-O3编译程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到很多人抱怨-O3选项:



GCC:程序无法使用编译选项-O3



Floating Point David Hammen提供的问题



我从GCC检查手册:


  -O3优化更多。 -O3打开由-O2指定的所有优化
,并打开
-finline-functions和-frename-registers选项。


我也确认了代码, options是-O3中包含的唯一两个优化:

  if(optimize> = 3){
flag_inline_functions = 1;
flag_rename_registers = 1;
}

对于这两个优化:



-finline-functions 在某些情况下很有用(主要是使用C ++),因为它允许我们使用-finline-limit定义内联函数的大小(默认为600)。编译器可能会报告错误,当设置高内联限制时抱怨缺少内存。



-frename-registers 尝试通过使用寄存器分配后剩余的寄存器来避免调度代码中出现假依赖。这种优化将最有利于具有大量寄存器的处理器。



对于内联函数,虽然它可以减少函数调用的数量,但它可能导致一个大的二进制文件,所以 - 线功能可能引入严重的缓存惩罚并变得比-O2更慢。我认为缓存惩罚不仅取决于程序本身。



对于重命名寄存器,我认为它不会对像x86这样的cisc架构产生任何积极的影响。



我的问题有2.5部分:



[答案] 1。我是否有权声称一个程序是否可以使用-O3选项更快地运行取决于底层平台/架构?



编辑:
第一部分确认为真。 David Hammen还声称,对于优化和浮点运算如何与具有扩展精度浮点寄存器(如Intel和AMD)的机器进行交互,我们应该非常小心。



2。 我什么时候可以自信地使用-O3选项?我想这两个优化,尤其是重命名寄存器可能导致与-O0 / O2不同的行为。我看到一些程序编译与-O3在执行过程中崩溃,是确定性的吗?如果我运行一个可执行文件一次没有任何崩溃,是否意味着使用-O3是安全的?



编辑:确定性与优化无关,它是多线程问题。但是,对于一个多线程程序,当我们运行可执行文件一次没有错误时,使用-O3是不安全的。 David Hammen表明O3对浮点运算的优化可能违反严格的弱排序标准进行比较。 当我们要使用-O3选项时,我们需要注意什么其他问题吗?



[已回答] 3。如果第一个问题的答案是是,那么当我更换目标平台或在具有不同机器的分布式系统中时,我可能需要在-O3和-O2之间更改。有没有任何一般的方式来决定是否可以通过-O3获得性能改进?例如,更多的寄存器,短内联函数等。



编辑:第三部分已经由Louen解释为各种平台使这个问题的一般推理不可能

解决方案

在评估-O3的性能增益时,我们必须同时使用和测试我们的代码, block =>


  1. 在使用-O3编译时,我看到一些程序崩溃了,是否是确定性的?


< blockquote>

如果程序是单线程的,程序使用的所有算法都是确定性的,如果从运行到运行的输入是相同的,是的。



如果在不使用-O3的情况下编译,这同样适用。


如果我运行一个可执行文件而没有任何崩溃,是否意味着可以安全地使用-O3?


当然不会。再次,如果你不使用-O3编译,同样适用。只是因为您的应用程序运行一次并不意味着它将在所有情况下成功运行。






浮点运算会导致奇怪的在其中浮点寄存器具有比双精度更高的精度的机器上的行为。例如,

  void add(double a,double b,double& result){
double temp = a + b;
result = temp;
if(result!= temp){
throw FunkyAdditionError(temp);
}
}

编译使用此添加函数未优化,您可能永远不会看到任何 FunkyAdditionError 异常。编译优化和某些输入将突然开始导致这些异常。问题是,通过优化,编译器会使 temp 一个寄存器,而 result 被编译成寄存器。添加 inline 限定词,当您的编译器使用 -O3 编译时,这些异常可能会消失,因为现在 result 也可以是一个寄存器。关于浮点运算的优化可能是一个棘手的问题。



最后,让我们看一个例子,当一个程序被编译的时候,与-O3, GCC:程序无法使用编译选项-O3 。该问题只发生在-O3,因为编译器可能内联了 distance 函数,但是在扩展精度浮点寄存器中保存了一个(但不是两个)结果。通过该优化,某些点 p1 p2 可以导致 p1 p2 评估为 true 。这违反了比较函数的严格弱排序标准。



您需要非常小心优化和浮点操作如何在具有扩展精度浮点的机器上进行交互寄存器(例如,Intel和AMD)。


I've seen a lot of people complaining about -O3 option:

GCC: program doesn't work with compilation option -O3

Floating Point Problem provided by David Hammen

I check the manual from the GCC:

   -O3    Optimize yet more.  -O3 turns on all optimizations
          specified   by   -O2   and   also   turns  on  the
          -finline-functions and -frename-registers options.

And I've also confirmed the code to make sure that two options is the only two optimizations included with -O3 on:

if (optimize >= 3){
    flag_inline_functions = 1;
    flag_rename_registers = 1;
}

For those two optimizations:

-finline-functions is useful in some cases (mainly with C++) because it lets us define the size of inlined functions (600 by default) with -finline-limit. Compiler may report an error complaining about lack of memory when set a high inline-limit.

-frename-registers attempts to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers.

For inline-functions, although it can reduce the numbers of function calls, but it may lead to a large binary files, so -finline-functions may introduce severe cache penalties and become even slower than -O2. I think the cache penalties not only depends on the program itself.

For rename-registers, I don't think it will have any positive impact on a cisc architecture like x86.

My question has 2.5 part:

[Answerd]1. Am I right to claim that whether a program can run faster with -O3 option depends on the underlying platform/architecture?

EDIT: The 1st part has been confirmed as true. David Hammen also claim that we should be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers like Intel and AMD.

2.When can I confidently use -O3 option? I suppose these two optimizations especially the rename-registers may lead to a different behaviors from -O0/O2. I saw some programs compiled with -O3 got crashed during execution, is it deterministic? If I run an executable once without any crash, does it mean it is safe to use -O3?

EDIT: The deterministicity has nothing to do with the optimization, it is a multithreading problem. However, for a multithread program, it is not safe to use -O3 when we run an executable once without errors. David Hammen shows that O3 optimization on floating point operations may violate the strict weak ordering criterion for a comparison. Is there any other concern we need to take care when we want to use -O3 option?

[Answered]3. If the answer of the 1st question is "yes", then when I change the target platform or in a distributed system with different machines, I may need to change between -O3 and -O2. Is there any general ways to decide whether I can get a performance improvement with -O3? For example, more registers, short inline functions, etc.

EDIT: The 3rd part has been answered by Louen as "the variety of platforms make general reasoning about this problem impossible" When evaluating the performance gain by -O3, we have to try it with both and benchmark our code to see which is faster.

解决方案

  1. I saw some programs got crashed when compiling with -O3, is it deterministic?

If the program is single threaded, all algorithms used by program are deterministic, and if the inputs from run to run are identical, yes. The answer is "not necessarily" if any of those conditions is not true.

The same applies if you compile without using -O3.

If I run an executable once without any crash, does it mean it is safe to use -O3?

Of course not. Once again, the same applies if you compile without using -O3. Just because your application runs once does not mean it will run successfully in all cases. That's part of what makes testing a hard problem.


Floating point operations can result in weird behaviors on machines in which the floating point registers have greater precision than do doubles. For example,

void add (double a, double b, double & result) {
   double temp = a + b;
   result = temp;
   if (result != temp) {
      throw FunkyAdditionError (temp);
   }
}

Compile a program that uses this add function unoptimized and you probably will never see any FunkyAdditionError exceptions. Compile optimized and certain inputs will suddenly start resulting in these exceptions. The problem is that with optimization, the compiler will make temp a register while result, being a reference, won't be compiled away into a register. Add an inline qualifier and those exceptions may disappear when your compiler is compiled with -O3 because now result can also be a register. Optimization with regard to floating point operations can be a tricky subject.

Finally, let's look at one of those cases where things did go bump in the night when a program was compiled with -O3, GCC: program doesn't work with compilation option -O3. The problem only occurred with -O3 because the compiler probably inlined the distance function but kept one (but not both) of the results in an extended precision floating point register. With this optimization, certain points p1 and p2 can result in both p1<p2 and p2<p1 evaluating to true. This violates the strict weak ordering criterion for a comparison function.

You need to be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers (e.g., Intel and AMD).

这篇关于我什么时候可以自信地用-O3编译程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆