链接器如何在翻译单元之间处理相同的模板实例化? [英] How does the linker handle identical template instantiations across translation units?

查看:93
本文介绍了链接器如何在翻译单元之间处理相同的模板实例化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个翻译单元:

Suppose I have two translation-units:

foo.cpp

void foo() {
  auto v = std::vector<int>();
}

bar.cpp

void bar() {
  auto v = std::vector<int>();
}

当我编译这些转换单元时,每个转换单元都会实例化std::vector<int>.

When I compile these translation-units, each will instantiate std::vector<int>.

我的问题是:在链接阶段如何工作?

My question is: how does this work at the linking stage?

  • 两个实例的名称是否有不同?
  • 链接器是否将它们删除为重复项?

推荐答案

C ++ 需要一个

C++ requires that an inline function definition be present in a translation unit that references the function. Template member functions are implicitly inline, but also by default are instantiated with external linkage. Hence the duplication of definitions that will be visible to the linker when the same template is instantiated with the same template arguments in different translation units. How the linker copes with this duplication is your question.

您的C ++编译器受C ++标准的约束,但您的链接器不受此约束 关于如何将C ++链接起来的任何成文标准:这是它本身的法律, 植根于计算历史,对对象的源语言无动于衷 对其链接进行编码.您的编译器必须使用目标链接器 可以并且将会这样做,以便您可以成功地链接程序并查看它们 您的期望.因此,我将向您展示GCC C ++编译器如何与 GNU链接器以不同的翻译单元处理相同的模板实例.

Your C++ compiler is subject to the C++ Standard, but your linker is not subject to any codified standard as to how it shall link C++: it is a law unto itself, rooted in computing history and indifferent to the source language of the object code it links. Your compiler has to work with what a target linker can and will do so that you can successfully link your programs and see them do what you expect. So I'll show you how the GCC C++ compiler interworks with the GNU linker to handle identical template instantiations in different translation units.

此演示利用了这样的事实,即C ++标准需要- 通过一个定义规则 -同一模板的不同翻译单元中的实例化具有 相同的模板参数应具有相同的定义,编译器- 当然-不能对不同之间的关系强制执行任何类似的要求 翻译单位.它必须信任我们.

This demonstration exploits the fact that while the C++ Standard requires - by the One Definition Rule - that the instantiations in different translation units of the same template with the same template arguments shall have the same definition, the compiler - of course - cannot enforce any requirement like that on relationships between different translation units. It has to trust us.

因此,我们将在不同的位置使用相同的参数实例化相同的模板 翻译单元,但我们会通过向其中注入宏控制的差异来作弊 在不同翻译单元中的实现,随后将显示 链接器选择哪个定义.

So we'll instantiate the same template with the same parameters in different translation units, but we'll cheat by injecting a macro-controlled difference into the implementations in different translation units that will subsequently show us which definition the linker picks.

如果您怀疑此作弊使演示无效,请记住:编译器 不知道ODR是否在不同的翻译部门得到了尊贵的认可, 因此它在该帐户上的行为不会有所不同,也没有这样的事情 作为欺骗"链接器.无论如何,演示将证明它是有效的.

If you suspect this cheat invalidates the demonstration, remember: the compiler cannot know whether the ODR is ever honoured across different translation units, so it cannot behave differently on that account, and there's no such thing as "cheating" the linker. Anyhow, the demo will demonstrate that it is valid.

首先我们有我们的作弊模板标题:

First we have our cheat template header:

thing.hpp

#ifndef THING_HPP
#define THING_HPP
#ifndef ID
#error ID undefined
#endif

template<typename T>
struct thing
{
    T id() const {
        return T{ID};
    }
};

#endif

ID的值是我们可以注入的跟踪器值.

The value of the macro ID is the tracer value we can inject.

下一个源文件:

foo.cpp

#define ID 0xf00
#include "thing.hpp"

unsigned foo()
{
    thing<unsigned> t;
    return t.id();
}

它定义函数foo,其中thing<unsigned>是 实例化以定义t,并返回t.id().通过成为一个功能 实例化thing<unsigned>的外部链接,foo用于此目的 的:-

It defines function foo, in which thing<unsigned> is instantiated to define t, and t.id() is returned. By being a function with external linkage that instantiates thing<unsigned>, foo serves the purposes of:-

  • 强制编译器完全实例化
  • 公开链接中的实例化,这样我们就可以探究什么 链接器会这么做.
  • obliging the compiler to do that instantiating at all
  • exposing the instantiation in linkage so we can then probe what the linker does with it.

另一个源文件:

boo.cpp

#define ID 0xb00
#include "thing.hpp"

unsigned boo()
{
    thing<unsigned> t;
    return t.id();
}

foo.cpp相似,只是它定义了boo代替foo和 设置ID = 0xb00.

which is just like foo.cpp except that it defines boo in place of foo and sets ID = 0xb00.

最后是一个程序源:

main.cpp

#include <iostream>

extern unsigned foo();
extern unsigned boo();

int main()
{
    std::cout << std::hex 
    << '\n' << foo()
    << '\n' << boo()
    << std::endl;
    return 0;
}

此程序将以十六进制形式打印foo()的返回值-我们的作弊应将其返回 = f00-然后是boo()的返回值-我们的作弊应该使之成为b00.

This program will print, as hex, the return value of foo() - which our cheat should make = f00 - then the return value of boo() - which our cheat should make = b00.

现在,我们将编译foo.cpp,并使用-save-temps进行编译,因为我们希望 看看组装:

Now we'll compile foo.cpp, and we'll do it with -save-temps because we want a look at the assembly:

g++ -c -save-temps foo.cpp

这会将程序集写在foo.s中,其中感兴趣的部分是 thing<unsigned int>::id() const的定义(缠结= _ZNK5thingIjE2idEv):

This writes the assembly in foo.s and the portion of interest there is the definition of thing<unsigned int>::id() const (mangled = _ZNK5thingIjE2idEv):

    .section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat
    .align 2
    .weak   _ZNK5thingIjE2idEv
    .type   _ZNK5thingIjE2idEv, @function
_ZNK5thingIjE2idEv:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -8(%rbp)
    movl    $3840, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

顶部的三个指令很重要:

Three of the directives at the top are significant:

.section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat

此函数将函数定义放在其自己的链接部分中 .text._ZNK5thingIjE2idEv将在需要时输出,并合并到 链接目标文件的程序的.text(即代码)部分.一种 这样的链接部分,即.text.<function_name>被称为 function-section . 这是一个仅包含 函数<function_name>定义的代码段.

This one puts the function definition in a linkage section of its own called .text._ZNK5thingIjE2idEv that will be output, if it's needed, merged into the .text (i.e. code) section of program in which the object file is linked. A linkage section like that, i.e. .text.<function_name> is called a function-section. It's a code section that contains only the definition of function <function_name>.

指令:

.weak   _ZNK5thingIjE2idEv

至关重要.它将thing<unsigned int>::id() const分类为符号. GNU链接器可以识别 strong 符号和 weak 符号.要获得强烈的象征, 链接器在链接中仅接受一个定义.如果还有更多,它将给出倍数 -定义错误.但是对于弱符号,它可以容忍任何数量的定义, 选一个.如果弱定义的符号在链接中也具有(仅一个)强定义,则 会选择强定义.如果符号具有多个弱定义而没有强定义, 那么链接器可以任意选择任何一个弱定义.

is crucial. It classifies thing<unsigned int>::id() const as a weak symbol. The GNU linker recognises strong symbols and weak symbols. For a strong symbol, the linker will accept only one definition in the linkage. If there are more, it will give a multiple -definition error. But for a weak symbol, it will tolerate any number of definitions, and pick one. If a weakly defined symbol also has (just one) strong definition in the linkage then the strong definition will be picked. If a symbol has multiple weak definitions and no strong definition, then the linker can pick any one of the weak definitions, arbitrarily.

指令:

.type   _ZNK5thingIjE2idEv, @function

thing<unsigned int>::id()分类为引用函数-而不是数据.

classifies thing<unsigned int>::id() as referring to a function - not data.

然后在定义主体中,代码在地址处进行汇编 用弱的全局符号_ZNK5thingIjE2idEv标记,局部相同 标记为.LFB2.代码返回3840(= 0xf00).

Then in the body of the definition, the code is assembled at the address labelled by the weak global symbol _ZNK5thingIjE2idEv, the same one locally labelled .LFB2. The code returns 3840 ( = 0xf00).

接下来,我们将以相同的方式编译boo.cpp:

Next we'll compile boo.cpp the same way:

g++ -c -save-temps boo.cpp

并再次查看boo.s

    .section    .text._ZNK5thingIjE2idEv,"axG",@progbits,_ZNK5thingIjE2idEv,comdat
    .align 2
    .weak   _ZNK5thingIjE2idEv
    .type   _ZNK5thingIjE2idEv, @function
_ZNK5thingIjE2idEv:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -8(%rbp)
    movl    $2816, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

除了作弊外,其他均相同:此定义返回2816(= 0xb00).

It's identical, except for our cheat: this definition returns 2816 ( = 0xb00).

虽然我们在这里,但请注意一些可能会或可能不会发生的事情: 一旦进入汇编(或目标代码),类就会消失.这里, 我们可以归结为:-

While we're here, let's note something that might or might not go without saying: Once we're in assembly (or object code), classes have evaporated. Here, we're down to: -

  • 数据
  • 代码
  • 符号,可以标记数据或标签代码.

因此,这里没有什么内容专门代表 thing<T> for 的实例化 T = unsigned.在这种情况下,thing<unsigned>剩下的全部是 _ZNK5thingIjE2idEv a.k.a thing<unsigned int>::id() const的定义.

So nothing here specifically represents the instantiation of thing<T> for T = unsigned. All that's left of thing<unsigned> in this instance is the definition of _ZNK5thingIjE2idEv a.k.a thing<unsigned int>::id() const.

所以现在我们知道编译器关于实例化thing<unsigned>的作用 在给定的翻译单元中.如果必须实例化thing<unsigned> 成员函数,然后组合实例化成员的定义 在标识成员函数的弱全局符号处起作用,并且它 将此定义放入其自己的功能部分.

So now we know what the compiler does about instantiating thing<unsigned> in a given translation unit. If it is obliged to instantiate a thing<unsigned> member function, then it assembles the definition of the instantiated member function at a weakly global symbol that identifies the member function, and it puts this definition into its own function-section.

现在让我们看看链接器的作用.

Now let's see what the linker does.

首先,我们将编译主要的源文件.

First we'll compile the main source file.

g++ -c main.cpp

然后链接所有目标文件,请求在_ZNK5thingIjE2idEv上进行诊断跟踪, 和链接映射文件:

Then link all the object files, requesting a diagnostic trace on _ZNK5thingIjE2idEv, and a linkage map file:

g++ -o prog main.o foo.o boo.o -Wl,--trace-symbol='_ZNK5thingIjE2idEv',-M=prog.map
foo.o: definition of _ZNK5thingIjE2idEv
boo.o: reference to _ZNK5thingIjE2idEv

因此,链接程序告诉我们程序从以下位置获取_ZNK5thingIjE2idEv的定义: foo.oboo.o中调用.

So the linker tells us that the program gets the definition of _ZNK5thingIjE2idEv from foo.o and calls it in boo.o.

运行程序表明它在说真话:

Running the program shows it's telling the truth:

./prog

f00
f00

foo()boo()都返回thing<unsigned>().id()的值 在 foo.cpp中实例化的.

Both foo() and boo() are returning the value of thing<unsigned>().id() as instantiated in foo.cpp.

thing<unsigned int>::id() const other 定义已成为什么 在boo.o中?该地图文件向我们显示:

What has become of the other definition of thing<unsigned int>::id() const in boo.o? The map file shows us:

程序地图

...
Discarded input sections
 ...
 ...
 .text._ZNK5thingIjE2idEv
                0x0000000000000000        0xf boo.o
 ...
 ...

链接器删除了boo.o中的功能部分 包含另一个定义.

The linker chucked away the function-section in boo.o that contained the other definition.

现在我们再次链接prog,但这一次是在链接中使用foo.oboo.o 倒序:

Let's now link prog again, but this time with foo.o and boo.o in the reverse order:

$ g++ -o prog main.o boo.o foo.o -Wl,--trace-symbol='_ZNK5thingIjE2idEv',-M=prog.map
boo.o: definition of _ZNK5thingIjE2idEv
foo.o: reference to _ZNK5thingIjE2idEv

这一次,程序从boo.o获取_ZNK5thingIjE2idEv的定义,并且 在foo.o中调用它.该程序确认:

This time, the program gets the definition of _ZNK5thingIjE2idEv from boo.o and calls it in foo.o. The program confirms that:

$ ./prog

b00
b00

地图文件显示:

...
Discarded input sections
 ...
 ...
 .text._ZNK5thingIjE2idEv
                0x0000000000000000        0xf foo.o
 ...
 ...

链接器取消了功能部分.text._ZNK5thingIjE2idEv 来自foo.o.

that the linker chucked away the function-section .text._ZNK5thingIjE2idEv from foo.o.

这完成了图片.

编译器在每个翻译单元中发出一个弱定义 每个实例化的模板成员都在其自己的功能部分中.链接器 然后只选择它遇到的那些弱定义的 first 在链接序列中需要解决对弱点的引用时 象征.因为每个弱符号都涉及一个定义,所以任何 其中一个-尤其是第一个-可用于解析所有引用 链接中的符号,其余的弱定义是 消耗的.多余的弱定义必须被忽略,因为 链接器只能链接给定符号的一个定义.还有盈余 链接器可以丢弃弱定义,而无需任何抵押 损害了程序,因为编译器本身就将每个程序都放置在一个链接部分中.

The compiler emits, in each translation unit, a weak definition of each instantiated template member in its own function section. The linker then just picks the first of those weak definitions that it encounters in the linkage sequence when it needs to resolve a reference to the weak symbol. Because each of the weak symbols addresses a definition, any one one of them - in particular, the first one - can be used to resolve all references to the symbol in the linkage, and the rest of the weak definitions are expendable. The surplus weak definitions must be ignored, because the linker can only link one definition of a given symbol. And the surplus weak definitions can be discarded by the linker, with no collateral damage to the program, because the compiler placed each one in a linkage section all by itself.

通过选择看到的 first 弱定义,链接器有效 随机选择,因为链接目标文件的顺序是任意的. 但这很好,只要我们遵守跨多个翻译单位的ODR , 因为我们做到了,所以所有的弱定义的确是相同的. #include-从头文件到处各处都放置类模板(并且在这样做时不进行宏注入任何本地编辑)的通常做法是一种相当可靠的服从规则的方法.

By picking the first weak definition it sees, the linker is effectively picking at random, because the order in which object files are linked is arbitrary. But this is fine, as long as we obey the ODR accross multiple translation units, because it we do, then all of the weak definitions are indeed identical. The usual practice of #include-ing a class template everywhere from a header file (and not macro-injecting any local edits when we do so) is a fairly robust way of obeying the rule.

这篇关于链接器如何在翻译单元之间处理相同的模板实例化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆