C++ 链接在实践中是如何工作的? [英] How does C++ linking work in practice?

查看:13
本文介绍了C++ 链接在实践中是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C++ 链接在实践中是如何工作的?我正在寻找的是关于如何链接发生的详细解释,而不是什么命令进行链接.

How does C++ linking work in practice? What I am looking for is a detailed explanation about how the linking happens, and not what commands do the linking.

已经有一个关于编译的类似问题没有详细说明:编译/链接过程是如何工作的?

There's already a similar question about compilation which doesn't go into too much detail: How does the compilation/linking process work?

推荐答案

编辑:我已将此答案移至副本:https://stackoverflow.com/a/33690144/895245

EDIT: I have moved this answer to the duplicate: https://stackoverflow.com/a/33690144/895245

此答案侧重于地址重定位,这是链接的关键功能之一.

This answer focuses on address relocation, which is one of the crucial functions of linking.

将使用一个最小的例子来阐明这个概念.

A minimal example will be used to clarify the concept.

总结:重定位编辑目标文件的.text部分进行翻译:

Summary: relocation edits the .text section of object files to translate:

  • 目标文件地址
  • 进入可执行文件的最终地址

这必须由链接器完成,因为编译器一次只能看到一个输入文件,但我们必须同时了解所有目标文件才能决定如何:

This must be done by the linker because the compiler only sees one input file at a time, but we must know about all object files at once to decide how to:

  • 解析未定义的符号,例如声明的未定义函数
  • 不冲突多个目标文件的多个 .text.data 部分
  • resolve undefined symbols like declared undefined functions
  • not clash multiple .text and .data sections of multiple object files

先决条件:基本了解:

  • x86-64 或 IA-32 程序集
  • ELF 文件的全局结构.我已经制作了 一个教程

链接与 C 或 C++ 无关:编译器只是生成目标文件.然后链接器将它们作为输入,而不知道是什么语言编译了它们.也可以是 Fortran.

Linking has nothing to do with C or C++ specifically: compilers just generate the object files. The linker then takes them as input without ever knowing what language compiled them. It might as well be Fortran.

所以为了减少外壳,让我们研究一个 NASM x86-64 ELF Linux hello world:

So to reduce the crust, let's study a NASM x86-64 ELF Linux hello world:

section .data
    hello_world db "Hello world!", 10
section .text
    global _start
    _start:

        ; sys_write
        mov rax, 1
        mov rdi, 1
        mov rsi, hello_world
        mov rdx, 13
        syscall

        ; sys_exit
        mov rax, 60
        mov rdi, 0
        syscall

编译和组装:

nasm -felf64 hello_world.asm            # creates hello_world.o
ld -o hello_world.out hello_world.o     # static ELF executable with no libraries

使用 NASM 2.10.09.

with NASM 2.10.09.

首先我们反编译目标文件的.text部分:

First we decompile the .text section of the object file:

objdump -d hello_world.o

给出:

0000000000000000 <_start>:
   0:   b8 01 00 00 00          mov    $0x1,%eax
   5:   bf 01 00 00 00          mov    $0x1,%edi
   a:   48 be 00 00 00 00 00    movabs $0x0,%rsi
  11:   00 00 00
  14:   ba 0d 00 00 00          mov    $0xd,%edx
  19:   0f 05                   syscall
  1b:   b8 3c 00 00 00          mov    $0x3c,%eax
  20:   bf 00 00 00 00          mov    $0x0,%edi
  25:   0f 05                   syscall

关键的几行是:

   a:   48 be 00 00 00 00 00    movabs $0x0,%rsi
  11:   00 00 00

它应该将hello world字符串的地址移动到rsi寄存器中,该寄存器被传递给write系统调用.

which should move the address of the hello world string into the rsi register, which is passed to the write system call.

但是等等!当程序加载时,编译器怎么可能知道 Hello world!" 将在内存中结束的位置?

But wait! How can the compiler possibly know where "Hello world!" will end up in memory when the program is loaded?

嗯,它不能,特别是在我们将一堆 .o 文件与多个 .data 部分链接在一起之后.

Well, it can't, specially after we link a bunch of .o files together with multiple .data sections.

只有链接器才能做到这一点,因为只有他才能拥有所有这些目标文件.

Only the linker can do that since only he will have all those object files.

所以编译器只是:

  • 在编译输出上放置一个占位符值 0x0
  • 为链接器提供了一些额外信息,说明如何使用正确的地址修改已编译的代码

这个额外信息"包含在目标文件的 .rela.text 部分中

This "extra information" is contained in the .rela.text section of the object file

.rela.text 代表.text 部分的重定位".

.rela.text stands for "relocation of the .text section".

使用重定位这个词是因为链接器必须将地址从对象重定位到可执行文件中.

The word relocation is used because the linker will have to relocate the address from the object into the executable.

我们可以反汇编 .rela.text 部分:

We can disassemble the .rela.text section with:

readelf -r hello_world.o

其中包含;

Relocation section '.rela.text' at offset 0x340 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000c  000200000001 R_X86_64_64       0000000000000000 .data + 0

本节的格式固定记录在:http://www.sco.com/developers/gabi/2003-12-17/ch4.reloc.html

The format of this section is fixed documented at: http://www.sco.com/developers/gabi/2003-12-17/ch4.reloc.html

每个条目告诉链接器一个需要重定位的地址,这里我们只有一个用于字符串.

Each entry tells the linker about one address which needs to be relocated, here we have only one for the string.

简化一下,对于这个特定的行,我们有以下信息:

Simplifying a bit, for this particular line we have the following information:

  • Offset = C:.text的第一个字节是这个条目改变的.

  • Offset = C: what is the first byte of the .text that this entry changes.

如果我们回头看反编译的文本,它恰好在关键的 movabs $0x0,%rsi 内,知道 x86-64 指令编码的人会注意到,它编码的是 64 位地址部分的指令.

If we look back at the decompiled text, it is exactly inside the critical movabs $0x0,%rsi, and those that know x86-64 instruction encoding will notice that this encodes the 64-bit address part of the instruction.

Name = .data:地址指向.data部分

Type = R_X86_64_64,它指定了确切的计算来转换地址.

Type = R_X86_64_64, which specifies what exactly what calculation has to be done to translate the address.

此字段实际上取决于处理器,因此记录在 AMD64 System V ABI 扩展 第 4.4 节重定位".

This field is actually processor dependent, and thus documented on the AMD64 System V ABI extension section 4.4 "Relocation".

该文档说 R_X86_64_64 确实:

  • Field = word64:8 个字节,因此 00 00 00 00 00 00 00 00 在地址 0xC

  • Field = word64: 8 bytes, thus the 00 00 00 00 00 00 00 00 at address 0xC

计算 = S + A

  • S是被重定位地址处的value,因此00 00 00 00 00 00 00 00
  • A 是加数,这里是 0.这是重定位条目的字段.
  • S is value at the address being relocated, thus 00 00 00 00 00 00 00 00
  • A is the addend which is 0 here. This is a field of the relocation entry.

所以 S + A == 0 我们将被重新定位到 .data 部分的第一个地址.

So S + A == 0 and we will get relocated to the very first address of the .data section.

现在让我们看看为我们生成的可执行ld的文本区域:

Now lets look at the text area of the executable ld generated for us:

objdump -d hello_world.out

给予:

00000000004000b0 <_start>:
  4000b0:   b8 01 00 00 00          mov    $0x1,%eax
  4000b5:   bf 01 00 00 00          mov    $0x1,%edi
  4000ba:   48 be d8 00 60 00 00    movabs $0x6000d8,%rsi
  4000c1:   00 00 00
  4000c4:   ba 0d 00 00 00          mov    $0xd,%edx
  4000c9:   0f 05                   syscall
  4000cb:   b8 3c 00 00 00          mov    $0x3c,%eax
  4000d0:   bf 00 00 00 00          mov    $0x0,%edi
  4000d5:   0f 05                   syscall

所以从目标文件中唯一改变的是关键行:

So the only thing that changed from the object file are the critical lines:

  4000ba:   48 be d8 00 60 00 00    movabs $0x6000d8,%rsi
  4000c1:   00 00 00

现在指向地址 0x6000d8(d8 00 60 00 00 00 00 00 in little-endian)而不是 0x0.

which now point to the address 0x6000d8 (d8 00 60 00 00 00 00 00 in little-endian) instead of 0x0.

这是 hello_world 字符串的正确位置吗?

Is this the right location for the hello_world string?

为了决定我们必须检查程序头,它告诉 Linux 加载每个部分的位置.

To decide we have to check the program headers, which tell Linux where to load each section.

我们将它们分解为:

readelf -l hello_world.out

给出:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000d7 0x00000000000000d7  R E    200000
  LOAD           0x00000000000000d8 0x00000000006000d8 0x00000000006000d8
                 0x000000000000000d 0x000000000000000d  RW     200000

 Section to Segment mapping:
  Segment Sections...
   00     .text
   01     .data

这告诉我们 .data 部分,即第二个部分,从 VirtAddr = 0x06000d8 开始.

This tells us that the .data section, which is the second one, starts at VirtAddr = 0x06000d8.

数据部分唯一的内容是我们的 hello world 字符串.

And the only thing on the data section is our hello world string.

这篇关于C++ 链接在实践中是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆