C ++链接在实践中如何工作? [英] How does C++ linking work in practice?

查看:111
本文介绍了C ++链接在实践中如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C ++链接在实践中如何工作?我正在寻找的是关于如何链接发生的详细解释,而不是什么命令做链接。



已经有一个类似的问题,编译,没有太多的细节:

编辑:我有已将此答案移至重复项目: http://stackoverflow.com/a/33690144/895245



这个回答集中在地址重定位,这是链接的关键功能之一。



示例将用于澄清概念。



0)简介



摘要:relocation编辑 .text 部分要翻译的物件档案:




  • 物件档案地址




这必须由链接器完成,因为编译器只能看到一个输入文件但是我们必须立即知道所有的目标文件,以决定如何:




  • 解析未定义的符号,例如声明的未定义函数

    li>
  • 不会冲突多个对象文件的多个 .text .data li>


先决条件:了解最少:





链接与C或C ++无关:编译器只是生成目标文件。链接器然后将它们作为输入,而不知道编译它们的语言。



为了减少地壳,让我们研究一个NASM x86-64 ELF Linux hello世界:

  section .data 
hello_world dbHello world!,10
section .text
global _start
_start:

; sys_write
mov rax,1
mov rdi,1
mov rsi,hello_world
mov rdx,13
syscall

; sys_exit
mov rax,60
mov rdi,0
syscall

编译和汇编:

  nasm -o hello_world.o hello_world.asm 
ld -o hello_world.out hello_world .o

与NASM 2.10.09。



< h2> 1).text of .o

首先我们反编译对象文件的 .text / p>

  objdump -d hello_world.o 

其中:

  0000000000000000 <_start>:
0:b8 01 00 00 00 mov $ 0x1,%eax
5:bf 01 00 00 00 mov $ 0x1,%edi
a:48 be 00 00 00 00 00 movabs $ 0x0,%rsi
11: 00 00
14:ba 0d 00 00 00 mov $ 0xd,%edx
19:0f 05 syscall
1b:b8 3c 00 00 00 mov $ 0x3c,%eax
20 :bf 00 00 00 00 mov $ 0x0,%edi
25:0f 05 syscall

关键的线是:

  a:48 be 00 00 00 00 00 movabs $ 0x0,%rsi 
11: 00 00 00

这会将hello world字符串的地址移动到 rsi 寄存器,它被传递给写系统调用。



但是等待!编译器如何能够知道当程序加载时Hello world!在内存中的位置?



好吧,它不能,特别是在我们链接一堆 .o 文件和多个 .data 节。



只有链接器可以这样做,因为只有他将拥有所有这些目标文件。



只是:




  • 在编译输出上放置一个占位符值 0x0
  • 向链接器提供了一些额外的信息,指示如何使用好地址修改编译代码



额外信息包含在对象文件

.rela.text 部分中。

2).rela.text



.rela.text 代表.text部分的重新定位。



使用字重定位是因为链接器必须将对象的地址重新定位到可执行文件中。



我们可以反汇编 .rela.text section with:

  readelf -r hello_world.o 

其中包含;

 重定位部分'.rela.text'在偏移量0x340包含1个条目:
偏移信息类型Sym。价值。名称+附加
00000000000c 000200000001 R_X86_64_64 0000000000000000 .data + 0

本节的格式固定记录在: http://www.sco.com /developers/gabi/2003-12-17/ch4.reloc.html



每个条目告诉链接器一个需要重定位的地址,这里我们只有一个字符串。



简化一下,对于这一行,我们有以下信息:





3).text的.out



现在让我们看看为我们生成的可执行文件 ld

  objdump -d hello_world.out 

给出:

 code> 00000000004000b0< _start>:
4000b0:b8 01 00 00 00 mov $ 0x1,%eax
4000b5:bf 01 00 00 00 mov $ 0x1,%edi
4000ba :48 be d8 00 60 00 00 movabs $ 0x6000d8,%rsi
4000c1:00 00 00
4000c4:ba 0d 00 00 00 mov $ 0xd,%edx
4000c9:0f 05 syscall
4000cb:b8 3c 00 00 00 mov $ 0x3c,%eax
4000d0:bf 00 00 00 00 mov $ 0x0,%edi
4000d5:0f 05 syscall

所以从对象文件中改变的唯一的事情是关键的行:

  4000ba:48 be d8 00 60 00 00 movabs $ 0x6000d8,%rsi 
4000c1:00 00 00

现在指向地址 0x6000d8 d8 00 60 00 00 00 00 00 in little-endian)而不是 0x0



code> hello_world string?



要决定我们必须检查程序头,告诉Linux在哪里加载每个节。 / p>

我们将它们分解:

  readelf -l hello_world.out 

其中:

 code>程序头:
类型偏移VirtAddr PhysAddr
FileSiz MemSiz标志对齐
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000d7 0x00000000000000d7 RE 200000
LOAD 0x00000000000000d8 0x00000000006000d8 0x00000000006000d8
0x000000000000000d 0x000000000000000d RW 200000

段映射的部分:
段段...
00 .text
01 .data

这告诉我们, .data 部分是第二个 VirtAddr = 0x06000d8



在数据部分是我们的hello世界字符串。


How does C++ linking work in practice? What I am looking for is a detailed explanation about how the linking happens, and not what commands do the linking.

There's already a similar question about compilation which doesn't go into too much detail: How does the compilation/linking process work?

解决方案

EDIT: I have moved this answer to the duplicate: http://stackoverflow.com/a/33690144/895245

This answer focuses on address relocation, which is one of the crucial functions of linking.

A minimal example will be used to clarify the concept.

0) Introduction

Summary: relocation edits the .text section of object files to translate:

  • object file address
  • into the final address of the executable

This must be done by the linker because the compiler only sees one input file at a time, but we must know about all object files at once to decide how to:

  • resolve undefined symbols like declared undefined functions
  • not clash multiple .text and .data sections of multiple object files

Prerequisites: minimal understanding of:

Linking has nothing to do with C or C++ specifically: compilers just generate the object files. The linker then takes them as input without ever knowing what language compiled them. It might as well be Fortran.

So to reduce the crust, let's study a NASM x86-64 ELF Linux hello world:

section .data
    hello_world db "Hello world!", 10
section .text
    global _start
    _start:

        ; sys_write
        mov rax, 1
        mov rdi, 1
        mov rsi, hello_world
        mov rdx, 13
        syscall

        ; sys_exit
        mov rax, 60
        mov rdi, 0
        syscall

compiled and assembled with:

nasm -o hello_world.o hello_world.asm
ld -o hello_world.out hello_world.o

with NASM 2.10.09.

1) .text of .o

First we decompile the .text section of the object file:

objdump -d hello_world.o

which gives:

0000000000000000 <_start>:
   0:   b8 01 00 00 00          mov    $0x1,%eax
   5:   bf 01 00 00 00          mov    $0x1,%edi
   a:   48 be 00 00 00 00 00    movabs $0x0,%rsi
  11:   00 00 00
  14:   ba 0d 00 00 00          mov    $0xd,%edx
  19:   0f 05                   syscall
  1b:   b8 3c 00 00 00          mov    $0x3c,%eax
  20:   bf 00 00 00 00          mov    $0x0,%edi
  25:   0f 05                   syscall

the crucial lines are:

   a:   48 be 00 00 00 00 00    movabs $0x0,%rsi
  11:   00 00 00

which should move the address of the hello world string into the rsi register, which is passed to the write system call.

But wait! How can the compiler possibly know where "Hello world!" will end up in memory when the program is loaded?

Well, it can't, specially after we link a bunch of .o files together with multiple .data sections.

Only the linker can do that since only he will have all those object files.

So the compiler just:

  • puts a placeholder value 0x0 on the compiled output
  • gives some extra information to the linker of how to modify the compiled code with the good addresses

This "extra information" is contained in the .rela.text section of the object file

2) .rela.text

.rela.text stands for "relocation of the .text section".

The word relocation is used because the linker will have to relocate the address from the object into the executable.

We can disassemble the .rela.text section with:

readelf -r hello_world.o

which contains;

Relocation section '.rela.text' at offset 0x340 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000c  000200000001 R_X86_64_64       0000000000000000 .data + 0

The format of this section is fixed documented at: http://www.sco.com/developers/gabi/2003-12-17/ch4.reloc.html

Each entry tells the linker about one address which needs to be relocated, here we have only one for the string.

Simplifying a bit, for this particular line we have the following information:

  • Offset = C: what is the first byte of the .text that this entry changes.

    If we look back at the decompiled text, it is exactly inside the critical movabs $0x0,%rsi, and those that know x86-64 instruction encoding will notice that this encodes the 64-bit address part of the instruction.

  • Name = .data: the address points to the .data section

  • Type = R_X86_64_64, which specifies what exactly what calculation has to be done to translate the address.

    This field is actually processor dependent, and thus documented on the AMD64 System V ABI extension section 4.4 "Relocation".

    That document says that R_X86_64_64 does:

    • Field = word64: 8 bytes, thus the 00 00 00 00 00 00 00 00 at address 0xC

    • Calculation = S + A

      • S is value at the address being relocated, thus 00 00 00 00 00 00 00 00
      • A is the addend which is 0 here. This is a field of the relocation entry.

      So S + A == 0 and we will get relocated to the very first address of the .data section.

3) .text of .out

Now lets look at the text area of the executable ld generated for us:

objdump -d hello_world.out

gives:

00000000004000b0 <_start>:
  4000b0:   b8 01 00 00 00          mov    $0x1,%eax
  4000b5:   bf 01 00 00 00          mov    $0x1,%edi
  4000ba:   48 be d8 00 60 00 00    movabs $0x6000d8,%rsi
  4000c1:   00 00 00
  4000c4:   ba 0d 00 00 00          mov    $0xd,%edx
  4000c9:   0f 05                   syscall
  4000cb:   b8 3c 00 00 00          mov    $0x3c,%eax
  4000d0:   bf 00 00 00 00          mov    $0x0,%edi
  4000d5:   0f 05                   syscall

So the only thing that changed from the object file are the critical lines:

  4000ba:   48 be d8 00 60 00 00    movabs $0x6000d8,%rsi
  4000c1:   00 00 00

which now point to the address 0x6000d8 (d8 00 60 00 00 00 00 00 in little-endian) instead of 0x0.

Is this the right location for the hello_world string?

To decide we have to check the program headers, which tell Linux where to load each section.

We disassemble them with:

readelf -l hello_world.out

which gives:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000d7 0x00000000000000d7  R E    200000
  LOAD           0x00000000000000d8 0x00000000006000d8 0x00000000006000d8
                 0x000000000000000d 0x000000000000000d  RW     200000

 Section to Segment mapping:
  Segment Sections...
   00     .text
   01     .data

This tells us that the .data section, which is the second one, starts at VirtAddr = 0x06000d8.

And the only thing on the data section is our hello world string.

这篇关于C ++链接在实践中如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆