GCC优化级别。哪个更好? [英] GCC optimization levels. Which is better?

查看:191
本文介绍了GCC优化级别。哪个更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



执行用O3编译的代码在资源方面总是非常贪婪?

p>

是否有任何科学参考或规格说明显示不同级别Mem / cpu消耗的差异?

在这个问题上经常关注这些优化对执行时间,编译代码大小和能量的影响。但是,我无法找到太多关于资源消耗的工作(通过启用优化)。



预先感谢。

解决方案

不,没有绝对的方法,因为编译器中的优化是一门艺术(甚至没有很好的定义,并且可能是不可判定棘手)但是一些准则第一:


  • 请确保您的程序是正确的 ,在优化任何东西之前没有错误,所以请调试并测试你的程序


  • 测试用例和代表性基准测试(请参阅)。

  • 请确保你的程序有 no 未定义的行为 (这很棘手,请参阅这个),因为 GCC 会奇怪地优化(但根据C99或C11标准,通常是正确的),如果你的代码中有UB的话;使用 -fsanitize = style 选项(以及 gdb valgrind ....)在调试阶段。
  • profile 您的代码(在各种基准测试中),特别是找出哪些部件值得优化;通常(但并非总是)大部分CPU时间只发生在一小部分代码中(经验法则:80%的时间花在20%的代码上;某些应用程序像 gcc 编译器这不是真的,请检查 gcc -ftime-report 来请求 gcc 来显示花费的时间在各种编译器模块中)......大多数情况下,过早优化是所有邪恶的根源(但这种格言有例外)。

  • 改进你的源代码(例如仔细和正确地使用 restrict const ,添加一些 pragmas 或< a href =https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html =nofollow noreferrer> function 或变量属性,也许明智地使用一些GCC builtins - , __ builtin_unreachable ...)


  • 使用最近的编译器。 GCC 的当前版本(2015年10月) 5.2 ,并在优化方面不断取得进展;您可以考虑从其源代码编译GCC以获得最新版本。

  • gcc -Wall -Wextra ),并尽力避免所有这些;一些警告只有在你要求优化时才会出现(例如 -O2 )通常情况下,编译用 -O2 -march = native (或者 -mtune = native ,我假设你不是交叉编译,如果您添加了好的 -march option ...),并用你的程序进行基准测试。 考虑链接时优化通过编译和链接 -flto 和相同的优化标志。例如,在 Makefile 中放入 CC = gcc -flto -O2 -march = native (然后移除 -O2 -mtune = native 从你的 CFLAGS 那里)...


  • 试试 -O3 -march = native ,通常(但并非总是如此,你可能有一些稍微快一点的代码, -O2 -O3 但这不常见),您可能会比 -O2


  • 如果您想优化生成的程序大小,请使用 -Os 而不是 -O2 -O3 ;更一般地说,不要忘记阅读控制优化选项的文件。我猜测 -O2 -Os 会优化堆栈使用情况(这与内存消耗密切相关)。有些GCC优化可以避免 malloc (这与堆内存消耗有关)。

  • 您可以考虑简介指导的优化 -fprofile-generate , -fprofile-use -fauto-profile options GCC 的文档,它有许多 optimization & 代码生成参数(例如 - ffast-math -Ofast ...)和 parameters ,你可能花费数月的时间尝试更多的;如果你有很多时间花费(几周或几个月),你可以使用自定义GCC MELT 添加您自己的新的(特定于应用程序的)优化通行证;但这很难(你需要了解GCC内部表示和组织),并且可能很少有价值,除非在特定的情况下(当你可以证明你花费了几个月时间来改进优化时)。
  • 您可能想了解您的程序的堆栈使用情况,因此请使用 - fstack-usage

  • 您可能想要了解已发布的汇编代码,请使用 - S -fverbose-asm 以及优化标志(并查看生成的 .s 汇编程序文件)


  • 您可能想了解GCC的内部工作,请使用各种 - fdump- * 标志(您将获得数百个转储文件!)。




<当然,上面的待办事项列表应该用于迭代和敏捷时尚。



内存泄露错误,请考虑 valgrind 和几个 -fsanitize = 调试选项。还请阅读有关垃圾回收(和 GC手​​册),特别是 Boehm的保守垃圾收集器以及编译时垃圾收集技术。

阅读 MILEPOST project in GCC。



也可以考虑 OpenMP OpenCL MPI 多线程等等......注意并行化是一件困难的事情。

请注意,即使GCC开发人员通常无法预测这种优化和这种优化的效果(在生成的二进制文件的CPU时间上)。不知何故,优化是一种黑色艺术。



或许 gcc-help@gcc.gnu.org 可能是一个好地方要求更具体的&有关 GCC 中的优化的精确和重点问题



你也可以通过 basile starynkevitch net 带着更专注的问题......(并提及原始问题的网址)



有关优化的科学论文,您会发现其中很多。从 ACM TOPLAS 开始, ACM TACO 等...搜索迭代编译器优化等....并且更好地定义你想要优化的资源(内存消耗意味着什么也没有。 ...)。


I am focusing on the CPU/memory consumption of compiled programs by GCC.

Executing code compiled with O3 is it always so greedy in term of resources ?

Is there any scientific reference or specification that shows the difference of Mem/cpu consumption of different levels?

People working on this problem often focus on the impact of these optimizations on the execution time, compiled code size, energy. However, I can't find too much work talking about resource consumption (by enabling optimizations).

Thanks in advance.

解决方案

No, there is no absolute way, because optimization in compilers is an art (and is even not well defined, and might be undecidable or intractable).

But some guidelines first:

  • be sure that your program is correct and has no bugs before optimizing anything, so do debug and test your program

  • have well designed test cases and representative benchmarks (see this).

  • be sure that your program has no undefined behavior (and this is tricky, see this), since GCC will optimize strangely (but very often correctly, according to C99 or C11 standards) if you have UB in your code; use the -fsanitize=style options (and gdb and valgrind ....) during debugging phase.

  • profile your code (on various benchmarks), in particular to find out what parts are worth optimization efforts; often (but not always) most of the CPU time happens in a small fraction of the code (rule of thumb: 80% of time spent in 20% of code; on some applications like the gcc compiler this is not true, check with gcc -ftime-report to ask gcc to show time spent in various compiler modules).... Most of the time "premature optimization is the root of all evil" (but there are exceptions to this aphorism).

  • improve your source code (e.g. use carefully and correctly restrict and const, add some pragmas or function or variable attributes, perhaps use wisely some GCC builtins __builtin_expect, __builtin_prefetch -see this-, __builtin_unreachable...)

  • use a recent compiler. Current version (october 2015) of GCC is 5.2 and continuous progress on optimization is made ; you might consider compiling GCC from its source code to have a recent version.

  • enable all warnings (gcc -Wall -Wextra) in the compiler, and try hard to avoid all of them; some warnings may appear only when you ask for optimization (e.g. with -O2)

  • Usually, compile with -O2 -march=native (or perhaps -mtune=native, I assume that you are not cross-compiling, if you do add the good -march option ...) and benchmark your program with that

  • Consider link-time optimization by compiling and linking with -flto and the same optimization flags. E.g., put CC= gcc -flto -O2 -march=native in your Makefile (then remove -O2 -mtune=native from your CFLAGS there)...

  • Try also -O3 -march=native, usually (but not always, you might sometimes has slightly faster code with -O2 than with -O3 but this is uncommon) you might get a tiny improvement over -O2

  • If you want to optimize the generated program size, use -Os instead of -O2 or -O3; more generally, don't forget to read the section Options That Control Optimization of the documentation. I guess that both -O2 and -Os would optimize the stack usage (which is very related to memory consumption). And some GCC optimizations are able to avoid malloc (which is related to heap memory consumption).

  • you might consider profile-guided optimizations, -fprofile-generate, -fprofile-use, -fauto-profile options

  • dive into the documentation of GCC, it has numerous optimization & code generation arguments (e.g. -ffast-math, -Ofast ...) and parameters and you could spend months trying some more of them; beware that some of them are not strictly C standard conforming!

  • if you have a lot of time to spend (weeks or months), you might customize GCC using MELT to add your own new (application-specific) optimization passes; but this is difficult (you'll need to understand GCC internal representations and organization) and probably rarely worthwhile, except in very specific cases (those when you can justify spending months of your time for improving optimization)

  • you might want to understand the stack usage of your program, so use -fstack-usage

  • you might want to understand the emitted assembler code, use -S -fverbose-asm in addition of optimization flags (and look into the produced .s assembler file)

  • you might want to understand the internal working of GCC, use various -fdump-* flags (you'll get hundred of dump files!).

Of course the above todo list should be used in an iterative and agile fashion.

For memory leaks bugs, consider valgrind and several -fsanitize= debugging options. Read also about garbage collection (and the GC handbook), notably Boehm's conservative garbage collector, and about compile-time garbage collection techniques.

Read about the MILEPOST project in GCC.

Consider also OpenMP, OpenCL, MPI, multi-threading, etc... Notice that parallelization is a difficult art.

Notice that even GCC developers are often unable to predict the effect (on CPU time of the produced binary) of such and such optimization. Somehow optimization is a black art.

Perhaps gcc-help@gcc.gnu.org might be a good place to ask more specific & precise and focused questions about optimizations in GCC

You could also contact me on basileatstarynkevitchdotnet with a more focused question... (and mention the URL of your original question)

For scientific papers on optimizations, you'll find lots of them. Start with ACM TOPLAS, ACM TACO etc... Search for iterative compiler optimization etc.... And define better what resources you want to optimize for (memory consumption means next to nothing....).

这篇关于GCC优化级别。哪个更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆