Clang vs GCC - 产生更快的二进制文件? [英] Clang vs GCC - which produces faster binaries?

查看:29
本文介绍了Clang vs GCC - 产生更快的二进制文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用 GCC,但我最近发现了 Clang,我正在考虑切换.但是,有一个决定因素 - 它生成的二进制文件的质量(速度、内存占用、可靠性) - 如果 gcc -O3 可以生成运行速度快 1% 的二进制文件,或者 Clang 二进制文件占用更多内存或只是由于编译器错误而失败,这是一个交易破坏者.

I'm currently using GCC, but I discovered Clang recently and I'm pondering switching. There is one deciding factor though - quality (speed, memory footprint, reliability) of binaries it produces - if gcc -O3can produce a binary that runs 1% faster, or Clang binaries take up more memory or just fail due to compiler bugs, it's a deal-breaker.

与 GCC 相比,Clang 拥有更好的编译速度和更低的编译时内存占用,但我对生成的已编译软件的基准测试/比较非常感兴趣 - 您能否指出一些预先存在的资源或您自己的基准测试?

Clang boasts better compile speeds and lower compile-time memory footprint than GCC, but I'm really interested in benchmarks/comparisons of resulting compiled software - could you point me to some pre-existing resources or your own benchmarks?

推荐答案

以下是我对 GCC 4.7.2 的一些最新发现,尽管范围很窄和 C++ 的 Clang 3.2.

Here are some up-to-date albeit narrow findings of mine with GCC 4.7.2 and Clang 3.2 for C++.

更新:GCC 4.8.1 v clang 3.3 比较附在下面.

更新:附加 GCC 4.8.2 v clang 3.4 比较.

我维护了一个 OSS 工具,它是为 Linux 构建的,带有 GCC 和 Clang,并使用 Microsoft 的 Windows 编译器.coan 工具是一个预处理器和 C/C++ 源文件和代码行的分析器:它的递归下降解析和文件处理的计算配置文件专业.开发分支(与这些结果相关)目前包括大约 90 个文件中的大约 11K LOC.它是编码的,现在,在 C++ 中,它富含多态性和模板,但仍然由于它在 C 中的不那么遥远的过去而陷入了许多补丁.移动语义没有被明确利用.它是单线程的.一世没有投入认真的精力来优化它,而架构"主要是待办事项.

I maintain an OSS tool that is built for Linux with both GCC and Clang, and with Microsoft's compiler for Windows. The tool, coan, is a preprocessor and analyser of C/C++ source files and codelines of such: its computational profile majors on recursive-descent parsing and file-handling. The development branch (to which these results pertain) comprises at present around 11K LOC in about 90 files. It is coded, now, in C++ that is rich in polymorphism and templates and but is still mired in many patches by its not-so-distant past in hacked-together C. Move semantics are not expressly exploited. It is single-threaded. I have devoted no serious effort to optimizing it, while the "architecture" remains so largely ToDo.

我只使用 3.2 之前的 Clang 作为实验性编译器因为,尽管它具有卓越的编译速度和诊断能力,但它的C++11 标准支持落后于当代 GCC 版本coan 所行使的尊重.在 3.2 中,这个差距已经缩小.

I employed Clang prior to 3.2 only as an experimental compiler because, despite its superior compilation speed and diagnostics, its C++11 standard support lagged the contemporary GCC version in the respects exercised by coan. With 3.2, this gap has been closed.

我的 Linux 测试工具大致适用于当前的 coan 开发过程70K 源文件混合单一文件解析器测试用例、压力消耗 1000 个文件的测试和消耗 < 的场景测试1K 个文件.

My Linux test harness for current coan development processes roughly 70K sources files in a mixture of one-file parser test-cases, stress tests consuming 1000s of files and scenario tests consuming < 1K files.

除了报告测试结果外,安全带还会累积和显示在 coan 中消耗的文件总数和消耗的运行时间(它只是将每个 coan 命令行传递给 Linux time 命令并捕获并添加报告的数字).由于任何数量为 0 的可测量时间的测试加起来都为 0,但这些测试的贡献可以忽略不计,这一事实让时间感到受宠若惊.计时统计信息显示在 make check 的末尾,如下所示:

As well as reporting the test results, the harness accumulates and displays the totals of files consumed and the run time consumed in coan (it just passes each coan command line to the Linux time command and captures and adds up the reported numbers). The timings are flattered by the fact that any number of tests which take 0 measurable time will all add up to 0, but the contribution of such tests is negligible. The timing stats are displayed at the end of make check like this:

coan_test_timer: info: coan processed 70844 input_files.
coan_test_timer: info: run time in coan: 16.4 secs.
coan_test_timer: info: Average processing time per input file: 0.000231 secs.

我比较了 GCC 4.7.2 和Clang 3.2,除了编译器之外,所有东西都是平等的.从 Clang 3.2 开始,我不再需要任何预处理器区分代码GCC 将编译的文件和 Clang 替代方案.我建到在每种情况下都使用相同的 C++ 库(GCC)并运行所有比较连续在同一个终端会话中.

I compared the test harness performance as between GCC 4.7.2 and Clang 3.2, all things being equal except the compilers. As of Clang 3.2, I no longer require any preprocessor differentiation between code tracts that GCC will compile and Clang alternatives. I built to the same C++ library (GCC's) in each case and ran all the comparisons consecutively in the same terminal session.

我的发布版本的默认优化级别是 -O2.我也在 -O3 处成功测试了构建.我测试了每个配置 3背靠背时间并平均3个结果,具有以下结果.数据单元中的数字是平均数coan 可执行文件处理每一个所消耗的微秒约 70K 输入文件(读取、解析和写入输出和诊断).

The default optimization level for my release build is -O2. I also successfully tested builds at -O3. I tested each configuration 3 times back-to-back and averaged the 3 outcomes, with the following results. The number in a data-cell is the average number of microseconds consumed by the coan executable to process each of the ~70K input files (read, parse and write output and diagnostics).

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.7.2 | 231 | 237 |0.97 |
----------|-----|-----|-----|
Clang-3.2 | 234 | 186 |1.25 |
----------|-----|-----|------
GCC/Clang |0.99 | 1.27|

任何特定的应用程序都很可能具有可发挥作用的特征对编译器的优点或缺点不公平.严格的基准测试采用不同的应用程序.考虑到这一点,值得注意的是这些数据的特点是:

Any particular application is very likely to have traits that play unfairly to a compiler's strengths or weaknesses. Rigorous benchmarking employs diverse applications. With that well in mind, the noteworthy features of these data are:

  1. -O3 优化对 GCC 略有不利
  2. -O3 优化对 Clang 非常有益
  3. 在 -O2 优化时,GCC 比 Clang 快了一点点
  4. 在 -O3 优化中,Clang 明显比 GCC 快.

偶然出现了两个编译器的更有趣的比较在这些发现之后不久.Coan 大量使用智能指针和其中之一是在文件处理中大量使用.这个特别为了编译器区分,如果是 std::unique_ptr配置的编译器对其用法有足够成熟的支持,否则为 std::shared_ptr.对 std::unique_ptr 的偏见是愚蠢的,因为这些指针实际上是被转移的,但 std::unique_ptr 看起来更适合替换std::auto_ptr 在 C++11 变体对我来说很新颖的时候.

A further interesting comparison of the two compilers emerged by accident shortly after those findings. Coan liberally employs smart pointers and one such is heavily exercised in the file handling. This particular smart-pointer type had been typedef'd in prior releases for the sake of compiler-differentiation, to be an std::unique_ptr<X> if the configured compiler had sufficiently mature support for its usage as that, and otherwise an std::shared_ptr<X>. The bias to std::unique_ptr was foolish, since these pointers were in fact transferred around, but std::unique_ptr looked like the fitter option for replacing std::auto_ptr at a point when the C++11 variants were novel to me.

在测试 Clang 3.2 的持续需求的构建过程中为了这个和类似的区别,我无意中建立了std::shared_ptr 当我打算构建 std::unique_ptr 时,并惊讶地发现生成的可执行文件,默认为 -O2优化,是我见过最快的,有时达到184毫秒.每个输入文件.通过对源代码的这一更改,相应的结果是这些;

In the course of experimental builds to gauge Clang 3.2's continued need for this and similar differentiation, I inadvertently built std::shared_ptr<X> when I had intended to build std::unique_ptr<X>, and was surprised to observe that the resulting executable, with default -O2 optimization, was the fastest I had seen, sometimes achieving 184 msecs. per input file. With this one change to the source code, the corresponding results were these;

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.7.2 | 234 | 234 |1.00 |
----------|-----|-----|-----|
Clang-3.2 | 188 | 187 |1.00 |
----------|-----|-----|------
GCC/Clang |1.24 |1.25 |

这里的注意点是:

  1. 现在这两种编译器都无法从 -O3 优化中受益.
  2. Clang 在每个优化级别都击败了 GCC.
  3. GCC 的性能仅受智能指针类型的轻微影响改变.
  4. Clang 的 -O2 性能主要受智能指针类型的影响改变.

在智能指针类型改变之前和之后,Clang 能够构建一个在 -O3 优化下可以执行更快的 coan,并且它可以在 -O2 和 -O3 处构建同样更快的可执行文件时指针类型是最好的 - std::shared_ptr<X> - 适合这项工作.

Before and after the smart-pointer type change, Clang is able to build a substantially faster coan executable at -O3 optimisation, and it can build an equally faster executable at -O2 and -O3 when that pointer-type is the best one - std::shared_ptr<X> - for the job.

我无法评论的一个明显问题是为什么Clang 应该能够在我的应用程序中找到 25% -O2 加速大量使用的智能指针类型从唯一更改为共享,而 GCC 对同样的变化无动于衷.我也不知道我是否应该欢呼或嘘 Clang 的 -O2 优化所包含的发现对我的智能指针选择的智慧如此敏感.

An obvious question that I am not competent to comment upon is why Clang should be able to find a 25% -O2 speed-up in my application when a heavily used smart-pointer-type is changed from unique to shared, while GCC is indifferent to the same change. Nor do I know whether I should cheer or boo the discovery that Clang's -O2 optimization harbours such huge sensitivity to the wisdom of my smart-pointer choices.

更新:GCC 4.8.1 v clang 3.3

现在对应的结果是:

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.1 | 442 | 443 |1.00 |
----------|-----|-----|-----|
Clang-3.3 | 374 | 370 |1.01 |
----------|-----|-----|------
GCC/Clang |1.18 |1.20 |

现在所有四个可执行文件的平均处理时间都比以前长得多1 个文件反映了最新编译器的性能.这是由于事实上,测试应用程序的后期开发分支已经承担了很多同时解析复杂性并以速度为代价.只有比率是意义重大.

The fact that all four executables now take a much greater average time than previously to process 1 file does not reflect on the latest compilers' performance. It is due to the fact that the later development branch of the test application has taken on lot of parsing sophistication in the meantime and pays for it in speed. Only the ratios are significant.

现在的注意点并不新颖:

The points of note now are not arrestingly novel:

  • GCC 对 -O3 优化无动于衷
  • clang 从 -O3 优化中获益甚微
  • clang 在每个优化级别都以同样重要的优势击败 GCC.

将这些结果与 GCC 4.7.2 和 clang 3.2 的结果进行比较,可以看出:在每个优化级别,GCC 已将 clang 的领先优势夺回了大约四分之一.但由于测试应用程序在此期间已经大量开发,因此无法自信地将这归因于 GCC 代码生成的追赶.(这一次,我记下了从中获取时间的应用程序快照并且可以再次使用.)

Comparing these results with those for GCC 4.7.2 and clang 3.2, it stands out that GCC has clawed back about a quarter of clang's lead at each optimization level. But since the test application has been heavily developed in the meantime one cannot confidently attribute this to a catch-up in GCC's code-generation. (This time, I have noted the application snapshot from which the timings were obtained and can use it again.)

更新:GCC 4.8.2 v clang 3.4

我完成了 GCC 4.8.1 v Clang 3.3 的更新,说我会坚持使用相同的 coan 快照以进行进一步更新.但我决定而是在该快照(rev. 301)上测试最新的发展快照我通过了它的测试套件(rev. 619).这给出了结果一点经度,我还有另一个动机:

I finished the update for GCC 4.8.1 v Clang 3.3 saying that I would stick to the same coan snaphot for further updates. But I decided instead to test on that snapshot (rev. 301) and on the latest development snapshot I have that passes its test suite (rev. 619). This gives the results a bit of longitude, and I had another motive:

我原来的帖子指出我没有投入任何精力来优化 coan速度.截至 rev 时,情况仍然如此.301.然而,在我建好之后将计时装置放入 coan 测试工具中,每次我运行测试套件时最新变化对性能的影响让我眼前一亮.我看到了它通常大得惊人,而且趋势比负面趋势更陡峭我觉得功能上的进步是值得的.

My original posting noted that I had devoted no effort to optimizing coan for speed. This was still the case as of rev. 301. However, after I had built the timing apparatus into the coan test harness, every time I ran the test suite the performance impact of the latest changes stared me in the face. I saw that it was often surprisingly big and that the trend was more steeply negative than I felt to be merited by gains in functionality.

按转速.308 测试套件中每个输入文件的平均处理时间自从第一次在这里发帖以来,翻了一番还多.那时我做了一个彻底改变我不关心性能的 10 年政策.在密集的一连串的修改高达 619 性能始终是一个考虑因素和一个他们中的许多人纯粹是从根本上重写关键的承载者更快的行(尽管没有使用任何非标准的编译器功能来做到这一点).看看每个编译器对此的反应会很有趣掉头,

By rev. 308 the average processing time per input file in the test suite had well more than doubled since the first posting here. At that point I made a U-turn on my 10 year policy of not bothering about performance. In the intensive spate of revisions up to 619 performance was always a consideration and a large number of them went purely to rewriting key load-bearers on fundamentally faster lines (though without using any non-standard compiler features to do so). It would be interesting to see each compiler's reaction to this U-turn,

这是最新的两个编译器版本 rev.301 的现在熟悉的时序矩阵:

Here is the now familiar timings matrix for the latest two compilers' builds of rev.301:

coan - rev.301 结果

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.2 | 428 | 428 |1.00 |
----------|-----|-----|-----|
Clang-3.4 | 390 | 365 |1.07 |
----------|-----|-----|------
GCC/Clang | 1.1 | 1.17|

这里的故事与 GCC-4.8.1 和 Clang-3.3 相比仅略有不同.海湾合作委员会的表现好一点.Clang的情况要差一些.噪音可以很好地解释这一点.Clang 仍然领先于 -O2-O3 边距,这在大多数情况下并不重要应用程序,但对很多人来说很重要.

The story here is only marginally changed from GCC-4.8.1 and Clang-3.3. GCC's showing is a trifle better. Clang's is a trifle worse. Noise could well account for this. Clang still comes out ahead by -O2 and -O3 margins that wouldn't matter in most applications but would matter to quite a few.

这是 rev 的矩阵.619.

And here is the matrix for rev. 619.

coan - rev.619 结果

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.2 | 210 | 208 |1.01 |
----------|-----|-----|-----|
Clang-3.4 | 252 | 250 |1.01 |
----------|-----|-----|------
GCC/Clang |0.83 | 0.83|

将 301 和 619 数字并排看,有几点可以说出来.

Taking the 301 and the 619 figures side by side, several points speak out.

  • 我的目标是编写更快的代码,两个编译器都强调正确我的努力.但是:

  • I was aiming to write faster code, and both compilers emphatically vindicate my efforts. But:

GCC 比 Clang 更慷慨地回报这些努力.在 -O2优化 Clang 的 619 构建比其 301 构建快 46%:在 -O3 Clang 的改善31%.很好,但在每个优化级别 GCC 的 619 构建都是是 301 的两倍多.

GCC repays those efforts far more generously than Clang. At -O2 optimization Clang's 619 build is 46% faster than its 301 build: at -O3 Clang's improvement is 31%. Good, but at each optimization level GCC's 619 build is more than twice as fast as its 301.

GCC 不仅逆转了 Clang 以前的优势.并且在每次优化GCC 级别现在比 Clang 高 17%.

GCC more than reverses Clang's former superiority. And at each optimization level GCC now beats Clang by 17%.

Clang 在 301 构建中通过 -O3 优化获得比 GCC 更多的利用的能力在 619 版本中消失了.两种编译器都没有从 -O3 中获得有意义的收益.

Clang's ability in the 301 build to get more leverage than GCC from -O3 optimization is gone in the 619 build. Neither compiler gains meaningfully from -O3.

我对这种命运的逆转感到非常惊讶,我怀疑我可能不小心构建了 clang 3.4 本身的缓慢构建(因为我构建了它来自源).所以我用发行版的 Clang 3.3 重新运行了 619 测试.这结果实际上与 3.4 相同.

I was sufficiently surprised by this reversal of fortunes that I suspected I might have accidentally made a sluggish build of clang 3.4 itself (since I built it from source). So I re-ran the 619 test with my distro's stock Clang 3.3. The results were practically the same as for 3.4.

所以关于掉头的反应:关于这里的数字,Clang 做了很多当我不给它时,在我的 C++ 代码中以极快的速度比 GCC 更好帮助.当我下定决心提供帮助时,GCC 比 Clang 做得更好.

So as regards reaction to the U-turn: On the numbers here, Clang has done much better than GCC at at wringing speed out of my C++ code when I was giving it no help. When I put my mind to helping, GCC did a much better job than Clang.

我没有将这种观察提升为原则,但我认为哪个编译器生成更好的二进制文件?"的教训是一个问题即使您指定了与答案相关的测试套件,仍然不是只为二进制文件计时的明确问题.

I don't elevate that observation into a principle, but I take the lesson that "Which compiler produces the better binaries?" is a question that, even if you specify the test suite to which the answer shall be relative, still is not a clear-cut matter of just timing the binaries.

您的更好的二进制文件是最快的二进制文件,还是最好的二进制文件补偿廉价制作的代码?或者最好补偿昂贵的精心设计的代码优先考虑可维护性和重用性而不是速度?这取决于您生成二进制文件的动机的性质和相对权重,以及你这样做的限制条件.

Is your better binary the fastest binary, or is it the one that best compensates for cheaply crafted code? Or best compensates for expensively crafted code that prioritizes maintainability and reuse over speed? It depends on the nature and relative weights of your motives for producing the binary, and of the constraints under which you do so.

无论如何,如果您非常关心打造最好的";二进制文件然后你最好继续检查编译器的连续迭代如何在您的最好"的想法在代码的连续迭代中.

And in any case, if you deeply care about building "the best" binaries then you had better keep checking how successive iterations of compilers deliver on your idea of "the best" over successive iterations of your code.

这篇关于Clang vs GCC - 产生更快的二进制文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆