为什么C ++编译需要这么长时间? [英] Why does C++ compilation take so long?

查看:302
本文介绍了为什么C ++编译需要这么长时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与C#和Java相比,编译C ++文件需要很长时间。编译C ++文件比运行正常大小的Python脚本需要更长的时间。我目前使用VC ++,但它是相同的任何编译器。为什么是这样?



我可以想到的两个原因是加载头文件和运行预处理器,但这似乎不应该解释为什么它需要这么长。

解决方案

几个原因:




  • 标题文件:每个编译单元需要数百甚至上千个标头为1:加载,2:编译。每个编译单元通常必须重新编译,因为预处理器确保编译标题的结果可能在每个编译单元之间变化。 (宏可以在一个编译单元中定义,该单元改变标题的内容)。



    这可能是主要原因需要为每个编译单元编译大量的代码,此外,每个头必须被多次编译(每个编译单元包含一次)


  • 链接:编译后,所有对象文件都必须链接在一起。这基本上是一个单一的过程,不能很好地并行化,并且必须处理整个项目。


  • 解析:语法解析极为复杂,严重依赖于上下文,很难消除歧义。这需要很多时间。


  • 模板:在C#,列表< T> 是唯一编译的类型,无论你在程序中有多少个实例化List。在C ++中,向量是与向量< float> 完全分离的类型,



    此外,模板组成了一个完整的图灵完整的子语言,编译器必须解释它,这可能会变得很复杂。即使相对简单的模板元编程代码也可以定义创建几十个和几十个模板实例化的递归模板。模板也可能导致极其复杂的类型,具有可笑的长名称,为链接器添加了大量额外的工作。 (它必须比较许多符号名称,如果这些名称可以增长到几千个字符,这可能变得相当昂贵)。



    当然,它们加剧了头文件的问题,因为模板通常必须在头文件中定义,这意味着需要为每个头文件解析和编译更多的代码编译单元。在纯C代码中,头部通常只包含前向声明,但实际代码很少。在C ++中,几乎所有代码都驻留在头文件中并不罕见。


  • 戏剧性的优化。 C#或Java不允许类被完全消除(为了反射目的,它们必须存在),但是即使一个简单的C ++模板元程序也可以轻松生成几十个或几百个类,所有这些类都被内联并在优化中再次消除阶段。



    此外,C ++程序必须由编译器完全优化。 C#程序可以依靠JIT编译器在加载时执行额外的优化,C ++不会获得任何这样的第二次机会。


  • 机器代码:C ++编译为机器代码,可能是比字节码Java或.NET使用更复杂(特别是在x86的情况下)。

    (这是提到的完整性只是因为它在评论中提及等等。在实践中,这一步是)




这些因素中的大部分都是由C共享的代码,实际上是相当有效地编译。解析步骤在C ++中非常复杂,并且可以占用更多的时间,但主要的罪犯可能是模板。它们是有用的,并使C ++成为一种更强大的语言,但它们也在编译速度方面付出了代价。


Compiling a C++ file takes a very long time when compared to C# and Java. It takes significantly longer to compile a C++ file than it would to run a normal size Python script. I'm currently using VC++ but it's the same with any compiler. Why is this?

The two reasons I could think of were loading header files and running the preprocessor, but that doesn't seem like it should explain why it takes so long.

解决方案

Several reasons:

  • Header files: Every single compilation unit requires hundreds or even thousands of headers to be 1: loaded, and 2: compiled. Every one of them typically has to be recompiled for every compilation unit, because the preprocessor ensure that the result of compiling a header might vary between every compilation unit. (A macro may be defined in one compilation unit which changes the content of the header).

    This is probably the main reason, as it requires huge amounts of code to be compiled for every compilation unit, and additionally, every header has to be compiled multiple times (once for every compilation unit that includes it)

  • Linking: Once compiled, all the object files have to be linked together. This is basically a monolithic process that can't very well be parallelized, and has to process your entire project.

  • Parsing: The syntax is extremely complicated to parse, depends heavily on context, and is very hard to disambiguate. This takes a lot of time

  • Templates: In C#, List<T> is the only type that is compiled, no matter how many instantiations of List you have in your program. In C++, vector<int> is a completely separate type from vector<float>, and each one will have to be compiled separately.

    Add to this that templates make up a full turing-complete "sub-language" that the compiler has to interpret, and this can become ridiculously complicated. Even relatively simple template metaprogramming code can define recursive templates that create dozens and dozens of template instantiations. Templates may also result in extremely complex types, with ridiculously long names, adding a lot of extra work to the linker. (It has to compare a lot of symbol names, and if these names can grow into many thousand characters, that can become fairly expensive).

    And of course, they exacerbate the problems with header files, because templates generally have to be defined in headers, which means far more code has to be parsed and compiled for every compilation unit. In plain C code, a header typically only contains forward declarations, but very little actual code. In C++, it is not uncommon for almost all the code to reside in header files.

  • Optimization: C++ allows for some very dramatic optimizations. C# or Java don't allow classes to be completely eliminated (they have to be there for reflection purposes), but even a simple C++ template metaprogram can easily generate dozens or hundreds of classes, all of which are inlined and eliminated again in the optimization phase.

    Moreover, a C++ program must be fully optimized by the compiler. A C# program can rely on the JIT compiler to perform additional optimizations at load-time, C++ doesn't get any such "second chances". What the compiler generates is as optimized as it's going to get.

  • Machine code: C++ is compiled to machine code which may be somewhat more complicated than the bytecode Java or .NET use (especially in the case of x86).
    (This is mentioned out of completeness only because it was mentioned in comments and such. In practice, this step is unlikely to take more than a tiny fraction of the total compilation time.)

Most of these factors are shared by C code, which actually compiles fairly efficiently. The parsing step is a lot more complicated in C++, and can take up significantly more time, but the main offender is probably templates. They're useful, and make C++ a far more powerful language, but they also take their toll in terms of compilation speed.

这篇关于为什么C ++编译需要这么长时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆