Clang对GCC - 产生更好的二进制? [英] Clang vs GCC - which produces better binaries?

查看:273
本文介绍了Clang对GCC - 产生更好的二进制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用GCC,但我最近发现了Clang,我正在考虑切换。有一个决定因素 - 质量(速度,内存占用,可靠性)的二进制生成 - 如果 gcc -O3 可以产生一个二进制运行1%更快或1

I'm currently using GCC, but I discovered Clang recently and I'm pondering switching. There is one deciding factor though - quality (speed, memory footprint, reliability) of binaries it produces - if gcc -O3can produce a binary that runs 1% faster or takes 1% less memory, it's a deal-breaker.

Clang拥有比GCC更好的编译速度和更低的编译时内存占用,但我真的对基准/比较感兴趣

Clang boasts better compile speeds and lower compile-time memory footprint than GCC, but I'm really interested in benchmarks/comparisons of resulting compiled software - could you point me to some or describe your experiences?

推荐答案

这里有一些最新的编译软件 - 你能指点一些或描述你的经验吗?使用GCC的狭窄结果4.7.2
和Clang 3.2 for C ++。

Here are some up-to-date albeit narrow findings of mine with GCC 4.7.2 and Clang 3.2 for C++.

更新:GCC 4.8.1 v clang 3.3比较如下。

更新:GCC 4.8.2 v clang 3.4附加比较。

我维护一个OSS工具,它是用于Linux的GCC和Clang,
和Microsoft的Windows编译器。该工具coan是一个预处理器
和C / C ++源文件和代码的分析器,它的
计算配置文件专用于递归下降解析和文件处理。
开发分支(这些结果涉及)
包括目前大约11K LOC在大约90个文件。它是编码,
现在,在C + +,有丰富的多态性和模板,但仍然是
在许多补丁中由于它不是那么遥远的过去在黑客一起C.。
移动语义不被明确利用。它是单线程的。 I
没有致力于优化它,而架构
仍然保持如此大的ToDo。

I maintain an OSS tool that is built for Linux with both GCC and Clang, and with Microsoft's compiler for Windows. The tool, coan, is a preprocessor and analyser of C/C++ source files and codelines of such: its computational profile majors on recursive-descent parsing and file-handling. The development branch (to which these results pertain) comprises at present around 11K LOC in about 90 files. It is coded, now, in C++ that is rich in polymorphism and templates and but is still mired in many patches by its not-so-distant past in hacked-together C. Move semantics are not expressly exploited. It is single-threaded. I have devoted no serious effort to optimizing it, while the "architecture" remains so largely ToDo.

我在3.2之前使用Clang作为一个实验编译器
,因为尽管其优越的编译速度和诊断,其
C ++ 11标准支持滞后于当前的GCC版本在
方面由coan行使。使用3.2,这个差距已经关闭。

I employed Clang prior to 3.2 only as an experimental compiler because, despite its superior compilation speed and diagnostics, its C++11 standard support lagged the contemporary GCC version in the respects exercised by coan. With 3.2, this gap has been closed.

我的Linux测试工具当前coan开发过程大约
70K源文件混合单文件解析器测试 - 情况,压力
测试消耗1000s的文件, 1K文件。
除了报告测试结果之外,线束累加,
显示消耗的文件总量和coan
中消耗的运行时间(它只是将每个coan命令行传递给Linux time 命令,
捕获并累计报告的数字)。这个时间是
,因为任何需要0可测量时间的测试都将
加起来为0,但是这样的测试的贡献可以忽略不计。
计时统计显示在 make check 的末尾,如下所示:

My Linux test harness for current coan development processes roughly 70K sources files in a mixture of one-file parser test-cases, stress tests consuming 1000s of files and scenario tests consuming < 1K files. As well as reporting the test results, the harness accumulates and displays the totals of files consumed and the run time consumed in coan (it just passes each coan command line to the Linux time command and captures and adds up the reported numbers). The timings are flattered by the fact that any number of tests which take 0 measurable time will all add up to 0, but the contribution of such tests is negligible. The timing stats are displayed at the end of make check like this:

coan_test_timer: info: coan processed 70844 input_files.
coan_test_timer: info: run time in coan: 16.4 secs.
coan_test_timer: info: Average processing time per input file: 0.000231 secs.

我比较了GCC 4.7.2和
Clang 3.2之间的测试程序的性能事情是平等的,除了编译器。从Clang 3.2,
我不再需要任何预处理器区分代码
tracts GCC将编译和Clang的替代品。我在每种情况下建立到
相同的C ++库(GCC),并在同一终端会话中连续运行所有比较

I compared the test harness performance as between GCC 4.7.2 and Clang 3.2, all things being equal except the compilers. As of Clang 3.2, I no longer require any preprocessor differentiation between code tracts that GCC will compile and Clang alternatives. I built to the same C++ library (GCC's) in each case and ran all the comparisons consecutively in the same terminal session.

默认我的发布版本的优化级别是-O2。我也
成功测试构建在-O3。我测试每个配置3
次背靠背和平均3个结果,与以下
结果。数据单元中的数字是coan可执行程序用来处理
〜70K输入文件(读,解析和写输出和诊断)中的每一个所消耗的平均
微秒。

The default optimization level for my release build is -O2. I also successfully tested builds at -O3. I tested each configuration 3 times back-to-back and averaged the 3 outcomes, with the following results. The number in a data-cell is the average number of microseconds consumed by the coan executable to process each of the ~70K input files (read, parse and write output and diagnostics).

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.7.2 | 231 | 237 |0.97 |
----------|-----|-----|-----|
Clang-3.2 | 234 | 186 |1.25 |
----------|-----|-----|------
GCC/Clang |0.99 | 1.27|

任何特定的应用程序很可能具有对编译器的优势不公平的特性,弱点。严格的基准
采用不同的应用程序。考虑到这一点,这些数据的值得注意的
特征是:

Any particular application is very likely to have traits that play unfairly to a compiler's strengths or weaknesses. Rigorous benchmarking employs diverse applications. With that well in mind, the noteworthy features of these data are:


  1. -O3优化对GCC有害li>
  2. -O3优化对Clang很有好处

  3. 在-O2优化时,GCC比Clang快一个晶须

  4. 在-O3优化时,Clang的重要性比GCC快。

偶然出现的两个编译器的另一个有趣的比较
后不久这些发现。 Coan自由地使用智能指针和
一个在文件处理中被大量执行。这个特定的
智能指针类型在以前的版本中被typedef定义为
编译器 - 微分,为 std :: unique_ptr< X> 如果
配置的编译器已经足够成熟地支持其用作
,否则为 std :: shared_ptr< X> 。对 std :: unique_ptr 的偏见是
foolish,因为这些指针事实上是围绕传递的,
std :: unique_ptr 看起来像是替换
std :: auto_ptr 的fitter选项。

A further interesting comparison of the two compilers emerged by accident shortly after those findings. Coan liberally employs smart pointers and one such is heavily exercised in the file handling. This particular smart-pointer type had been typedef'd in prior releases for the sake of compiler-differentiation, to be an std::unique_ptr<X> if the configured compiler had sufficiently mature support for its usage as that, and otherwise an std::shared_ptr<X>. The bias to std::unique_ptr was foolish, since these pointers were in fact transferred around, but std::unique_ptr looked like the fitter option for replacing std::auto_ptr at a point when the C++11 variants were novel to me.

在实验性构建过程中,为了衡量Clang 3.2的持续需要
和类似的差异,我无意中构建了
<$ c $当我打算构建 std :: unique_ptr< X>
,并且感到惊讶时,我可以使用c> std :: shared_ptr< X> 观察到生成的可执行文件,默认-O2
优化,是我见过的最快的,有时达到184
msecs。每个输入文件。有了这一个对源代码的更改,
的相应结果是这些;

In the course of experimental builds to gauge Clang 3.2's continued need for this and similar differentiation, I inadvertently built std::shared_ptr<X> when I had intended to build std::unique_ptr<X>, and was surprised to observe that the resulting executable, with default -O2 optimization, was the fastest I had seen, sometimes achieving 184 msecs. per input file. With this one change to the source code, the corresponding results were these;

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.7.2 | 234 | 234 |1.00 |
----------|-----|-----|-----|
Clang-3.2 | 188 | 187 |1.00 |
----------|-----|-----|------
GCC/Clang |1.24 |1.25 |

这里的注意事项是:


  1. Clang在每个优化级别都与GCC一样重要。

  2. GCC的性能只受到智能指针类型
    更改的影响。

  3. Clon的-O2表现受到智能指针类型
    更改的影响。

在智能指针类型更改之前和之后,Clang能够在-O3优化时创建一个基本上更快的coan可执行文件,
可以在-O2和-O3处构建一个同样更快的可执行文件当
指针类型是最好的 - std :: shared_ptr< X> - 作业。

Before and after the smart-pointer type change, Clang is able to build a substantially faster coan executable at -O3 optimisation, and it can build an equally faster executable at -O2 and -O3 when that pointer-type is the best one - std::shared_ptr<X> - for the job.

一个明显的问题,我不能评论是为什么
Clang应该能够在我的申请中找到25%-O2加速
a大量使用的智能指针类型从唯一改变为共享,
,而GCC对同样的改变无所谓。我也不知道我应该
欢呼还是boo发现,Clang的-O2优化拥有
这样巨大的灵敏度智慧指针选择的智慧。

An obvious question that I am not competent to comment upon is why Clang should be able to find a 25% -O2 speed-up in my application when a heavily used smart-pointer-type is changed from unique to shared, while GCC is indifferent to the same change. Nor do I know whether I should cheer or boo the discovery that Clang's -O2 optimization harbours such huge sensitivity to the wisdom of my smart-pointer choices.

UPDATE:GCC 4.8.1 v clang 3.3

相应的结果现在是:

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.1 | 442 | 443 |1.00 |
----------|-----|-----|-----|
Clang-3.3 | 374 | 370 |1.01 |
----------|-----|-----|------
GCC/Clang |1.18 |1.20 |

事实上,所有四个可执行文件比以前处理
所需的平均时间1文件 不反映最新编译器的性能。这是由于
事实,后来的测试应用程序的开发分支已经采取了很多的
解析的复杂性,同时支付它的速度。只有比例是
有效。

The fact that all four executables now take a much greater average time than previously to process 1 file does not reflect on the latest compilers' performance. It is due to the fact that the later development branch of the test application has taken on lot of parsing sophistication in the meantime and pays for it in speed. Only the ratios are significant.

现在的注意点并不奇怪:

The points of note now are not arrestingly novel:


  • GCC对-O3优化无动于衷

  • clang从-03优化中获得的利益非常有限


将这些结果与GCC 4.7.2和clang 3.2的结果进行比较,可以看出
在每个优化级别,GCC已经抓住了约四分之一的ang。但
,因为测试应用程序已经大量开发,同时一个不能
自信地将这归因于GCC的代码生成中的赶上。
(这一次,我注意到应用程序的快照,从中获得
的时间,并且可以再次使用它。)

Comparing these results with those for GCC 4.7.2 and clang 3.2, it stands out that GCC has clawed back about a quarter of clang's lead at each optimization level. But since the test application has been heavily developed in the meantime one cannot confidently attribute this to a catch-up in GCC's code-generation. (This time, I have noted the application snapshot from which the timings were obtained and can use it again.)

UPDATE :GCC 4.8.2 v clang 3.4

我完成了GCC 4.8.1 v Clang 3.3的更新,说我会
坚持同一coan snaphot进一步更新。但我决定
而不是测试快照(rev。301)在最新的开发
快照我已经通过了它的测试套件(转619)。这给出结果a
位的经度,我有另一个动机:

I finished the update for GCC 4.8.1 v Clang 3.3 saying that I would stick to the same coan snaphot for further updates. But I decided instead to test on that snapshot (rev. 301) and on the latest development snapshot I have that passes its test suite (rev. 619). This gives the results a bit of longitude, and I had another motive:

我的原始帖子指出,我一直不遗余力地优化coan为
速度。这仍然是的情况。然而,我已经建立
的定时装置到coan测试线束,每次我运行测试套件
的最新更改的性能影响盯着我的脸。我看到
它通常是惊人的大,并且趋势比
更陡峭的负值我认为是功能的增益值得。

My original posting noted that I had devoted no effort to optimizing coan for speed. This was still the case as of rev. 301. However, after I had built the timing apparatus into the coan test harness, every time I ran the test suite the performance impact of the latest changes stared me in the face. I saw that it was often surprisingly big and that the trend was more steeply negative than I felt to be merited by gains in functionality.

按转。 308测试套件中每个输入文件的平均处理时间自从第一次发布以来,
大大增加了一倍。在那一刻,我做了一个
U转向我的10年政策不打扰性能。在密集的
修改高达619性能总是一个考虑和一个
大数量他们纯粹重写关键负载承载基本上
更快的线路(虽然没有使用任何非标准的编译器功能这样做)。有趣的是看到每个编译器对这个
U-turn的反应,

By rev. 308 the average processing time per input file in the test suite had well more than doubled since the first posting here. At that point I made a U-turn on my 10 year policy of not bothering about performance. In the intensive spate of revisions up to 619 performance was always a consideration and a large number of them went purely to rewriting key load-bearers on fundamentally faster lines (though without using any non-standard compiler features to do so). It would be interesting to see each compiler's reaction to this U-turn,

这是现在熟悉的时序矩阵,最新的两个编译器的rev .301:

Here is the now familiar timings matrix for the latest two compilers' builds of rev.301:

coan - rev.301结果

coan - rev.301 results

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.2 | 428 | 428 |1.00 |
----------|-----|-----|-----|
Clang-3.4 | 390 | 365 |1.07 |
----------|-----|-----|------
GCC/Clang | 1.1 | 1.17|

这里的故事从GCC-4.8.1和Clang-3.3 GCC显示
是一个小事更好。 Clang是一个微不足道。噪声可以很好地解释这一点。
Clang仍然出现 -O2 -O3 利润率在大多数
应用程序,但是会有很多。

The story here is only marginally changed from GCC-4.8.1 and Clang-3.3. GCC's showing is a trifle better. Clang's is a trifle worse. Noise could well account for this. Clang still comes out ahead by -O2 and -O3 margins that wouldn't matter in most applications but would matter to quite a few.

这里是rev的矩阵。 619。

And here is the matrix for rev. 619.

coan - rev.619结果

          | -O2 | -O3 |O2/O3|
----------|-----|-----|-----|
GCC-4.8.2 | 210 | 208 |1.01 |
----------|-----|-----|-----|
Clang-3.4 | 252 | 250 |1.01 |
----------|-----|-----|------
GCC/Clang |0.83 | 0.83|

将301和619人并排,几点说出来。

Taking the 301 and the 619 figures side by side, several points speak out.


  • 我的目的是编写更快的代码,两个编译器都强调了
    我的努力。但是:

  • I was aiming to write faster code, and both compilers emphatically vindicate my efforts. But:

GCC比Clang偿还这些努力更慷慨。在 -O2
优化Clang's 619 build比它的301 build快46%:at -O3 Clang
的改善是31%。好的,但是在每个优化级别,GCC的619版本是
的两倍,比它的301快两倍。

GCC repays those efforts far more generously than Clang. At -O2 optimization Clang's 619 build is 46% faster than its 301 build: at -O3 Clang's improvement is 31%. Good, but at each optimization level GCC's 619 build is more than twice as fast as its 301.

GCC超过Clang的前优势。

GCC more than reverses Clang's former superiority. And at each optimization level GCC now beats Clang by 17%.

Clang在301版本中的能力比GCC从 -O3 优化
在619版本中消失了。两个编译器都没有从 -O3 中有意义地获得。

Clang's ability in the 301 build to get more leverage than GCC from -O3 optimization is gone in the 619 build. Neither compiler gains meaningfully from -O3.

足够惊讶的这种逆转的命运,我怀疑我
可能意外地使一个缓慢的建立铛3.4本身(因为我从来源建立
)。所以我用我的发行版的股票Clang 3.3重新运行了619测试。
的结果与3.4几乎相同。

I was sufficiently surprised by this reversal of fortunes that I suspected I might have accidentally made a sluggish build of clang 3.4 itself (since I built it from source). So I re-ran the 619 test with my distro's stock Clang 3.3. The results were practically the same as for 3.4.

对于U形转弯的反应:在这里的数字上,Clang做了很多
比GCC更好的速度在我的C ++代码,当我给它没有
帮助。当我把我的想法帮助,GCC做了一个比Clang好得多的工作。

So as regards reaction to the U-turn: On the numbers here, Clang has done much better than GCC at at wringing speed out of my C++ code when I was giving it no help. When I put my mind to helping, GCC did a much better job than Clang.

我不提高观察到一个原则,但我拿
哪个编译器生成更好的二进制文件?是一个问题
,即使你指定的答案应该是相对的测试套件,
仍然不是一个明确的事情只是计时的二进制文件。

I don't elevate that observation into a principle, but I take the lesson that "Which compiler produces the better binaries?" is a question that, even if you specify the test suite to which the answer shall be relative, still is not a clear-cut matter of just timing the binaries.

是你最好的二进制文件是最快的二进制文件,还是最好的二进制文件
补偿廉价的代码?或者最好是昂贵地补偿

精心设计的代码,优先考虑可维护性和重用速度?它取决于
的性质和生成二进制文件的动机的相对权重,以及
的约束条件。

Is your better binary the fastest binary, or is it the one that best compensates for cheaply crafted code? Or best compensates for expensively crafted code that prioritizes maintainability and reuse over speed? It depends on the nature and relative weights of your motives for producing the binary, and of the constraints under which you do so.

在任何情况下,如果你非常关心构建最好的二进制文件,那么你
最好保持检查编译器的连续迭代如何提供你的
在你的代码的连续迭代中最好的想法。

And in any case, if you deeply care about building "the best" binaries then you had better keep checking how successive iterations of compilers deliver on your idea of "the best" over successive iterations of your code.

这篇关于Clang对GCC - 产生更好的二进制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆