编译/链接过程如何工作? [英] How does the compilation/linking process work?

查看:129
本文介绍了编译/链接过程如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C ++中编程了一段时间,我想知道编译器和链接过程是如何工作的?



有人可以解释吗?




意在成为 Stack Overflow的C ++常见问题的条目。如果您想批评在此提供常见问题解答的想法表单,然后在启动所有此操作的元数据上发布将是这样做的地方。对问题的答案在 C ++聊天室中进行监控,其中常见问题解答首先开始,因此您的答案很可能会由提出该想法的人阅读。)

解决方案

C ++程序的编译涉及三个步骤:


  1. 预处理:预处理器需要一个C ++源代码文件,并处理 #include s, #define 指令。


  2. 编译:编译器使用预处理器的输出并产生一个对象文件。


  3. 链接:链接器接收编译器生成的目标文件,并生成库或可执行文件。

    li>



预处理



预处理器处理 ,例如 #include #define 。它是不可知的C ++的语法,这就是为什么它必须小心使用。



它在一个C ++源文件,通过替换 #include 指令与相应文件的内容(通常只是声明),替换宏( #define ),根据 #if #ifdef #ifndef 指令。



预处理器在预处理令牌流上工作。宏替换被定义为用其他令牌替换令牌(当有意义时,运算符 ## 启用合并两个令牌)。



在所有这些之后,预处理器产生单个输出,其是从上述变换产生的令牌流。它还添加了一些特殊的标记,告诉编译器每一行来自哪里,以便它可以使用这些来产生敏感的错误消息。



在这个阶段可能会产生一些错误聪明地使用 #if #error 指令。



编译



编译步骤在预处理器的每个输出上执行。编译器解析纯C ++源代码(现在没有任何预处理器指令),并将其转换为汇编代码。然后调用底层后端(在工具链中的汇编器),将该代码汇编成机器码,以某种格式(ELF,COFF,a.out,...)生成实际的二进制文件。此目标文件包含输入中定义的符号的编译代码(以二进制形式)。对象文件中的符号由名称引用。



对象文件可以引用未定义的符号。这是使用声明时的情况,不提供定义。编译器不介意这一点,只要源代码格式正确,就会很乐意生成目标文件。



编译器通常让你在这一点停止编译。这是非常有用的,因为它可以单独编译每个源代码文件。这提供的优点是,如果您只更改一个文件,则不需要重新编译所有。



生成的目标文件可以



在这个阶段,常规编译器错误,例如语法错误或失败的重载解析错误,



链接



链接器是由编译器生成的目标文件产生最终编译输出的。这个输出可以是共享(或动态)库(和名称相似,它们没有与前面提到的静态库很相似)或可执行文件。



它通过使用正确的地址替换对未定义符号的引用来链接所有对象文件。这些符号中的每一个都可以在其他目标文件或库中定义。如果它们在标准库以外的库中定义,您需要告诉链接器它们。



在这个阶段,最常见的错误是缺少定义或重复的定义。前者意味着这些定义不存在(即它们没有被写入),或者它们所在的对象文件或库没有被给予链接器。后者是显而易见的:在两个不同的目标文件或库中定义了相同的符号。


I've been programming in C++ for a while and I wondered how the compiler and linking process actually works?

Can someone explain please?

(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)

解决方案

The compilation of a C++ program involves three steps:

  1. Preprocessing: the preprocessor takes a C++ source code file and deals with the #includes, #defines and other preprocessor directives. The output of this step is a "pure" C++ file without pre-processor directives.

  2. Compilation: the compiler takes the pre-processor's output and produces an object file from it.

  3. Linking: the linker takes the object files produced by the compiler and produces either a library or an executable file.

Preprocessing

The preprocessor handles the preprocessor directives, like #include and #define. It is agnostic of the syntax of C++, which is why it must be used with care.

It works on one C++ source file at a time by replacing #include directives with the content of the respective files (which is usually just declarations), doing replacement of macros (#define), and selecting different portions of text depending of #if, #ifdef and #ifndef directives.

The preprocessor works on a stream of preprocessing tokens. Macro substitution is defined as replacing tokens with other tokens (the operator ## enables merging two tokens when it make sense).

After all this, the preprocessor produces a single output that is a stream of tokens resulting from the transformations described above. It also adds some special markers that tell the compiler where each line came from so that it can use those to produce sensible error messages.

Some errors can be produced at this stage with clever use of the #if and #error directives.

Compilation

The compilation step is performed on each output of the preprocessor. The compiler parses the pure C++ source code (now without any preprocessor directives) and converts it into assembly code. Then invokes underlying back-end(assembler in toolchain) that assembles that code into machine code producing actual binary file in some format(ELF, COFF, a.out, ...). This object file contains the compiled code (in binary form) of the symbols defined in the input. Symbols in object files are referred to by name.

Object files can refer to symbols that are not defined. This is the case when you use a declaration, and don't provide a definition for it. The compiler doesn't mind this, and will happily produce the object file as long as the source code is well-formed.

Compilers usually let you stop compilation at this point. This is very useful because with it you can compile each source code file separately. The advantage this provides is that you don't need to recompile everything if you only change a single file.

The produced object files can be put in special archives called static libraries, for easier reusing later on.

It's at this stage that "regular" compiler errors, like syntax errors or failed overload resolution errors, are reported.

Linking

The linker is what produces the final compilation output from the object files the compiler produced. This output can be either a shared (or dynamic) library (and while the name is similar, they haven't got much in common with static libraries mentioned earlier) or an executable.

It links all the object files by replacing the references to undefined symbols with the correct addresses. Each of these symbols can be defined in other object files or in libraries. If they are defined in libraries other than the standard library, you need to tell the linker about them.

At this stage the most common errors are missing definitions or duplicate definitions. The former means that either the definitions don't exist (i.e. they are not written), or that the object files or libraries where they reside were not given to the linker. The latter is obvious: the same symbol was defined in two different object files or libraries.

这篇关于编译/链接过程如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆