Clang vs GCC-产生更快的二进制文件? [英] Clang vs GCC - which produces faster binaries?

查看:122
本文介绍了Clang vs GCC-产生更快的二进制文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用GCC,但最近发现了Clang,并且正在考虑切换.不过,有一个决定性因素-它生成的二进制文件的质量(速度,内存占用量,可靠性)-如果gcc -O3可以生成运行速度快1%的二进制文件,或者Clang二进制文件占用更多内存或只是由于编译器错误而失败,这是一个破坏交易的事情.

I'm currently using GCC, but I discovered Clang recently and I'm pondering switching. There is one deciding factor though - quality (speed, memory footprint, reliability) of binaries it produces - if gcc -O3can produce a binary that runs 1% faster, or Clang binaries take up more memory or just fail due to compiler bugs, it's a deal-breaker.

与GCC相比,Clang具有更好的编译速度和更低的编译时内存占用空间,但是我对生成的编译软件的基准测试/比较非常感兴趣-您能向我指出一些还是描述您的经验?

Clang boasts better compile speeds and lower compile-time memory footprint than GCC, but I'm really interested in benchmarks/comparisons of resulting compiled software - could you point me to some or describe your experiences?

推荐答案

以下是一些最新的信息,尽管我在GCC 4.7.2中发现的范围很窄 和C ++的Clang 3.2.

Here are some up-to-date albeit narrow findings of mine with GCC 4.7.2 and Clang 3.2 for C++.

更新:下面附有GCC 4.8.1 v clang 3.3比较.

更新:附加了GCC 4.8.2 v clang 3.4比较.

我维护的OSS工具是为Linux设计的,同时具有GCC和Clang, 以及适用于Windows的Microsoft编译器.工具coan是预处理器 和C/C ++源文件以及以下代码行的分析器: 递归下降解析和文件处理方面的计算机配置文件专业. 开发分支(这些结果与之相关) 目前包含大约90个文件中的大约11K LOC.它被编码, 现在,在C ++中,它具有丰富的多态性和模板,但是仍然 由于其在被砍在一起的C语言中的不那么遥远的过去而陷入了许多补丁. 移动语义没有被明确利用.它是单线程的.一世 并未全力以赴地对其进行优化,而架构" 仍然有很大的待办事项.

I maintain an OSS tool that is built for Linux with both GCC and Clang, and with Microsoft's compiler for Windows. The tool, coan, is a preprocessor and analyser of C/C++ source files and codelines of such: its computational profile majors on recursive-descent parsing and file-handling. The development branch (to which these results pertain) comprises at present around 11K LOC in about 90 files. It is coded, now, in C++ that is rich in polymorphism and templates and but is still mired in many patches by its not-so-distant past in hacked-together C. Move semantics are not expressly exploited. It is single-threaded. I have devoted no serious effort to optimizing it, while the "architecture" remains so largely ToDo.

我在3.2之前的版本中仅使用Clang作为实验编译器 因为,尽管它具有出色的编译速度和诊断功能,但其 C ++ 11标准支持落后于当代的GCC版本 尊重柯南.在3.2版本中,这个差距已经缩小.

I employed Clang prior to 3.2 only as an experimental compiler because, despite its superior compilation speed and diagnostics, its C++11 standard support lagged the contemporary GCC version in the respects exercised by coan. With 3.2, this gap has been closed.

我的Linux测试工具大致适用于当前的Coan开发流程 70K源文件混合在一个文件解析器测试用例中,压力很大 测试消耗了数千个文件,而场景测试消耗了< 1K文件. 除了报告测试结果外,线束还会累积并 显示消耗的文件总数和消耗的运行时间(以coan计) (它只是将每个coan命令行传递给Linux time命令, 捕获并汇总报告的数字).时机受宠若惊 事实上,任何数量的测试需要花费0个可测量的时间 全部加起来为0,但此类测试的贡献可忽略不计.这 计时统计信息显示在make check的末尾,如下所示:

My Linux test harness for current coan development processes roughly 70K sources files in a mixture of one-file parser test-cases, stress tests consuming 1000s of files and scenario tests consuming < 1K files. As well as reporting the test results, the harness accumulates and displays the totals of files consumed and the run time consumed in coan (it just passes each coan command line to the Linux time command and captures and adds up the reported numbers). The timings are flattered by the fact that any number of tests which take 0 measurable time will all add up to 0, but the contribution of such tests is negligible. The timing stats are displayed at the end of make check like this:

coan_test_timer: info: coan processed 70844 input_files.
coan_test_timer: info: run time in coan: 16.4 secs.
coan_test_timer: info: Average processing time per input file: 0.000231 secs.

我比较了GCC 4.7.2与GCC之间的测试线束性能 Clang 3.2,除编译器外,其他所有条件都相同.从Clang 3.2开始, 我不再需要在代码之间进行任何预处理程序区分 GCC会编译和使用Clang替代方案.我建立了 在每种情况下都使用相同的C ++库(GCC),并运行所有比较 在同一终端会话中连续进行.

I compared the test harness performance as between GCC 4.7.2 and Clang 3.2, all things being equal except the compilers. As of Clang 3.2, I no longer require any preprocessor differentiation between code tracts that GCC will compile and Clang alternatives. I built to the same C++ library (GCC's) in each case and ran all the comparisons consecutively in the same terminal session.

我的发行版的默认优化级别是-O2.我也 在-O3处成功测试了构建.我测试了每种配置3 连续背靠背并平均3个结果,以下是 结果.数据单元中的数字是 coan可执行文件处理每个文件所消耗的微秒 约70K的输入文件(读取,解析和写入输出以及诊断信息).

The default optimization level for my release build is -O2. I also successfully tested builds at -O3. I tested each configuration 3 times back-to-back and averaged the 3 outcomes, with the following results. The number in a data-cell is the average number of microseconds consumed by the coan executable to process each of the ~70K input files (read, parse and write output and diagnostics).

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.7.2 | 231 | 237 |0.97 |
----------|-----|-----|-----|
Clang-3.2 | 234 | 186 |1.25 |
----------|-----|-----|------
GCC/Clang |0.99 | 1.27|

任何特定的应用程序都有可能发挥特质 不公平地影响了编译器的长处或短处.严格的基准测试 有各种各样的应用程序.考虑到这一点,值得注意的是 这些数据的特征是:

Any particular application is very likely to have traits that play unfairly to a compiler's strengths or weaknesses. Rigorous benchmarking employs diverse applications. With that well in mind, the noteworthy features of these data are:

  1. -O3优化对GCC的影响很小
  2. -O3优化对Clang十分重要
  3. 通过-O2优化,GCC比Clang快了很多"
  4. 在-O3优化上,Clang比GCC更快.

两个编译器的进一步有趣的比较是偶然出现的 这些发现之后不久. Coan自由地使用了智能指针,并且 其中一种在文件处理中被大量使用.这个特别的 为了实现以下目的,智能指针类型已在以前的版本中进行了typedef'd定义: 编译器差异,如果std::unique_ptr<X> 配置的编译器对其用法的使用已经足够成熟的支持 ,否则为std::shared_ptr<X>.对std::unique_ptr的偏见是 愚蠢的是,因为这些指针实际上是在周围转移的, 但std::unique_ptr看起来像是用于替换的钳工选项 std::auto_ptr在C ++ 11变体对我来说是新颖的时候.

A further interesting comparison of the two compilers emerged by accident shortly after those findings. Coan liberally employs smart pointers and one such is heavily exercised in the file handling. This particular smart-pointer type had been typedef'd in prior releases for the sake of compiler-differentiation, to be an std::unique_ptr<X> if the configured compiler had sufficiently mature support for its usage as that, and otherwise an std::shared_ptr<X>. The bias to std::unique_ptr was foolish, since these pointers were in fact transferred around, but std::unique_ptr looked like the fitter option for replacing std::auto_ptr at a point when the C++11 variants were novel to me.

在实验构建过程中,以评估Clang 3.2的持续需求 为此和类似的差异,我无意中建立了 std::shared_ptr<X>当我打算构建std::unique_ptr<X>时, 惊讶地发现生成的可执行文件(默认为-O2) 优化是我所见过的最快的,有时达到184 毫秒每个输入文件.通过对源代码进行的这一更改, 相应的结果是这些;

In the course of experimental builds to gauge Clang 3.2's continued need for this and similar differentiation, I inadvertently built std::shared_ptr<X> when I had intended to build std::unique_ptr<X>, and was surprised to observe that the resulting executable, with default -O2 optimization, was the fastest I had seen, sometimes achieving 184 msecs. per input file. With this one change to the source code, the corresponding results were these;

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.7.2 | 234 | 234 |1.00 |
----------|-----|-----|-----|
Clang-3.2 | 188 | 187 |1.00 |
----------|-----|-----|------
GCC/Clang |1.24 |1.25 |

这里的要点是:

  1. 这两个编译器现在都没有从-O3优化中受益.
  2. 在每个优化级别上,Clang击败GCC都同样重要.
  3. GCC的性能仅受智能指针类型的影响 改变.
  4. C的-O2性能受智能指针类型的影响很大 改变.
  1. Neither compiler now benefits at all from -O3 optimization.
  2. Clang beats GCC just as importantly at each level of optimization.
  3. GCC's performance is only marginally affected by the smart-pointer type change.
  4. Clang's -O2 performance is importantly affected by the smart-pointer type change.

在更改智能指针类型之前和之后,Clang能够构建一个 -O3优化时,coan可执行文件的运行速度大大提高,并且它可以 当在-O2和-O3处构建一个同样更快的可执行文件时 指针类型是工作的最佳选择-std::shared_ptr<X>.

Before and after the smart-pointer type change, Clang is able to build a substantially faster coan executable at -O3 optimisation, and it can build an equally faster executable at -O2 and -O3 when that pointer-type is the best one - std::shared_ptr<X> - for the job.

一个我无法评论的明显问题是为什么 当我在应用程序中使用Clang时,应该能够将O2速度提高25% 大量使用的智能指针类型从唯一更改为共享, 而海湾合作委员会对相同的变化无动于衷.我也不知道我是否应该 为Clang的-O2优化所怀有的发现感到高兴或嘘 对我的智能指针选择的智慧如此敏感.

An obvious question that I am not competent to comment upon is why Clang should be able to find a 25% -O2 speed-up in my application when a heavily used smart-pointer-type is changed from unique to shared, while GCC is indifferent to the same change. Nor do I know whether I should cheer or boo the discovery that Clang's -O2 optimization harbours such huge sensitivity to the wisdom of my smart-pointer choices.

更新:GCC 4.8.1 v clang 3.3

现在相应的结果是:

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.1 | 442 | 443 |1.00 |
----------|-----|-----|-----|
Clang-3.3 | 374 | 370 |1.01 |
----------|-----|-----|------
GCC/Clang |1.18 |1.20 |

所有四个可执行文件现在平均花费的时间比以前要多得多 1个文件 not 不能反映最新的编译器性能.这是由于 事实上,测试应用程序的后续开发分支已经承担了很多 同时解析复杂性并为之付出代价.只有比率是 重要.

The fact that all four executables now take a much greater average time than previously to process 1 file does not reflect on the latest compilers' performance. It is due to the fact that the later development branch of the test application has taken on lot of parsing sophistication in the meantime and pays for it in speed. Only the ratios are significant.

现在要注意的地方并不是新颖的:

The points of note now are not arrestingly novel:

  • 海湾合作委员会对-O3优化无动于衷
  • clang从-O3优化中获得的好处很小
  • 在每个优化级别上,clang都以同样重要的优势击败了GCC.

将这些结果与GCC 4.7.2和clang 3.2的结果进行比较,可以发现 在每个优化级别,GCC都收回了c四分之一的领先优势.但 由于测试应用程序在此期间已经过大量开发,因此无法 有信心地将其归因于GCC代码生成的赶超. (这一次,我注意到了从中获取时间的应用程序快照 并可以再次使用.)

Comparing these results with those for GCC 4.7.2 and clang 3.2, it stands out that GCC has clawed back about a quarter of clang's lead at each optimization level. But since the test application has been heavily developed in the meantime one cannot confidently attribute this to a catch-up in GCC's code-generation. (This time, I have noted the application snapshot from which the timings were obtained and can use it again.)

更新:GCC 4.8.2 v clang 3.4

我完成了GCC 4.8.1 v Clang 3.3的更新,说我会 坚持使用相同的方式进行进一步更新.但是我决定 而是在最新的开发环境中测试快照(版本301) 我拥有通过其测试套件的快照(修订版619).这给结果 一点经度,我还有另一个动机:

I finished the update for GCC 4.8.1 v Clang 3.3 saying that I would stick to the same coan snaphot for further updates. But I decided instead to test on that snapshot (rev. 301) and on the latest development snapshot I have that passes its test suite (rev. 619). This gives the results a bit of longitude, and I had another motive:

我的原始帖子指出,我没有为优化Coan做出任何努力. 速度.截至转速仍然如此. 301.但是,在我建造完之后 每次我运行测试套件时,计时设备都将放入Coan测试工具中 最新变化对性能的影响令我面目全非.我看见了 它通常大得令人惊讶,而且趋势比 功能增强使我感到很值得.

My original posting noted that I had devoted no effort to optimizing coan for speed. This was still the case as of rev. 301. However, after I had built the timing apparatus into the coan test harness, every time I ran the test suite the performance impact of the latest changes stared me in the face. I saw that it was often surprisingly big and that the trend was more steeply negative than I felt to be merited by gains in functionality.

由rev. 308测试套件中每个输入文件的平均处理时间为 自从在此发布第一篇文章以来,翻了一倍还多.那时我做了一个 放弃我十年不打扰性能的政策.在密集 一直到619性能的大量修改始终是一个考虑因素, 他们中的许多人纯粹是为了从根本上重写主要的承重人 速度更快(尽管无需使用任何非标准的编译器功能).看到每个编译器对此的反应将会很有趣 掉头,

By rev. 308 the average processing time per input file in the test suite had well more than doubled since the first posting here. At that point I made a U-turn on my 10 year policy of not bothering about performance. In the intensive spate of revisions up to 619 performance was always a consideration and a large number of them went purely to rewriting key load-bearers on fundamentally faster lines (though without using any non-standard compiler features to do so). It would be interesting to see each compiler's reaction to this U-turn,

以下是最新熟悉的两个编译器版本301版本的时序矩阵:

Here is the now familiar timings matrix for the latest two compilers' builds of rev.301:

coan-301版结果

coan - rev.301 results

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.2 | 428 | 428 |1.00 |
----------|-----|-----|-----|
Clang-3.4 | 390 | 365 |1.07 |
----------|-----|-----|------
GCC/Clang | 1.1 | 1.17|

这里的故事与GCC-4.8.1和Clang-3.3相比仅有一点变化.海湾合作委员会的演出 更好一点. lang声更糟.噪音很可能是造成这种情况的原因. Clang仍然以-O2-O3的利润率领先,这在大多数情况下都无关紧要 应用程序,但对很多应用程序来说很重要.

The story here is only marginally changed from GCC-4.8.1 and Clang-3.3. GCC's showing is a trifle better. Clang's is a trifle worse. Noise could well account for this. Clang still comes out ahead by -O2 and -O3 margins that wouldn't matter in most applications but would matter to quite a few.

这是rev的矩阵. 619.

And here is the matrix for rev. 619.

coan-修订版619条结果

coan - rev.619 results

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.2 | 210 | 208 |1.01 |
----------|-----|-----|-----|
Clang-3.4 | 252 | 250 |1.01 |
----------|-----|-----|------
GCC/Clang |0.83 | 0.83|

将301和619数字并排来看,有几点要说出来.

Taking the 301 and the 619 figures side by side, several points speak out.

  • 我的目标是编写更快的代码,并且两个编译器都强调辩护 我的努力.但是:

  • I was aiming to write faster code, and both compilers emphatically vindicate my efforts. But:

GCC比Clang更慷慨地回报了这些努力.在-O2 优化Clang的619版本比301的版本快46%:在-O3 Clang的 改善率为31%.不错,但是在每个优化级别,GCC的619版本都是 速度是其301的两倍以上.

GCC repays those efforts far more generously than Clang. At -O2 optimization Clang's 619 build is 46% faster than its 301 build: at -O3 Clang's improvement is 31%. Good, but at each optimization level GCC's 619 build is more than twice as fast as its 301.

GCC不仅扭转了Clang以前的优势.并在每次优化时 现在,GCC等级比Clang胜了17%.

GCC more than reverses Clang's former superiority. And at each optimization level GCC now beats Clang by 17%.

Clang在301版本中的能力比-O3优化中的GCC更具杠杆作用 在619版本中消失了.这两个编译器都没有从-O3有意义地获得收益.

Clang's ability in the 301 build to get more leverage than GCC from -O3 optimization is gone in the 619 build. Neither compiler gains meaningfully from -O3.

我对这种命运的逆转感到非常惊讶,以至于我怀疑自己 可能是因为不小心使clang 3.4本身编译缓慢(自从我构建了 它来自源代码).因此,我使用发行版的股票Clang 3.3重新运行了619测试.这 结果实际上与3.4相同.

I was sufficiently surprised by this reversal of fortunes that I suspected I might have accidentally made a sluggish build of clang 3.4 itself (since I built it from source). So I re-ran the 619 test with my distro's stock Clang 3.3. The results were practically the same as for 3.4.

因此,对于掉头的反应:在这里的数字上,C做了很多 当我不给它任何帮助时,以比我的C ++代码快的速度胜过GCC 帮助.当我下定决心要提供帮助时,GCC的工作要比Clang好得多.

So as regards reaction to the U-turn: On the numbers here, Clang has done much better than GCC at at wringing speed out of my C++ code when I was giving it no help. When I put my mind to helping, GCC did a much better job than Clang.

我没有将这种观察提升为原则,但我认为 哪个编译器产生更好的二进制文件?"这一课是一个问题 即使您指定答案相对应的测试套件, 仅对二进制文件进行计时仍然不是明确的问题.

I don't elevate that observation into a principle, but I take the lesson that "Which compiler produces the better binaries?" is a question that, even if you specify the test suite to which the answer shall be relative, still is not a clear-cut matter of just timing the binaries.

您最好的二进制文件是最快的二进制文件,还是最好的二进制文件? 补偿廉价制作的代码?或最佳地补偿昂贵 精心设计的代码优先考虑可维护性和重用性,而不是速度?这取决于 产生二元的动机的性质和相对权重,以及 这样做的限制条件.

Is your better binary the fastest binary, or is it the one that best compensates for cheaply crafted code? Or best compensates for expensively crafted code that prioritizes maintainability and reuse over speed? It depends on the nature and relative weights of your motives for producing the binary, and of the constraints under which you do so.

无论如何,如果您非常在意构建最佳"二进制文件,那么您 最好继续检查编译器的连续迭代如何交付给您 在代码的连续迭代中实现最佳"的想法.

And in any case, if you deeply care about building "the best" binaries then you had better keep checking how successive iterations of compilers deliver on your idea of "the best" over successive iterations of your code.

这篇关于Clang vs GCC-产生更快的二进制文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆