动态加载和弱符号解析度 [英] Dynamic loading and weak symbol resolution

查看:125
本文介绍了动态加载和弱符号解析度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

分析这个问题我发现了有关动态加载情况下弱符号解析行为的一些信息()在Linux上.现在,我正在寻找管理此规范.

Analyzing this question I found out some things about behavior of weak symbol resolution in the context of dynamic loading (dlopen) on Linux. Now I'm looking for the specifications governing this.

让我们以为例.假设有一个程序a,该程序以该顺序动态加载库b.soc.so.如果c.so依赖于其他两个库foo.so(实际上是该示例中的libgcc.so)和bar.so(实际上是libpthread.so),则通常可以使用bar.so导出的符号来满足foo.so.但是,如果b.so也依赖于foo.so但不依赖于bar.so,则这些弱符号显然将不会与bar.so链接.似乎foo.so墨水仅从ab.so及其所有依赖项中查找符号.

Let's take an example. Suppose there is a program a which dynamically loads libraries b.so and c.so, in that order. If c.so depends on two other libraries foo.so (actually libgcc.so in that example) and bar.so (actually libpthread.so), then usually symbols exported by bar.so can be used to satisfy weak symbol linkages in foo.so. But if b.so also depends on foo.so but not on bar.so, then these weak symbols will apparently not be linked against bar.so. It seems as if foo.so inkages only look for symbols from a and b.so and all their dependencies.

从某种意义上讲,这是有道理的,因为否则加载c.so可能会在b.so已经在使用库的某个点上更改foo.so的行为.另一方面,在让我开始的问题中,这引起了很多麻烦,因此我想知道是否有解决该问题的方法.而且为了找到解决方法,我首先需要对这些情况下如何指定符号分辨率的确切细节有一个很好的了解.

This makes sense, to some degree, since otherwise loading c.so might change the behavior of foo.so at some point where b.so has already been using the library. On the other hand, in the question that got me started this caused quite a bit of trouble, so I wonder whether there is a way around this problem. And in order to find ways around, I first need a good understanding about the very exact details how symbol resolution in these cases is specified.

在这些情况下定义正确行为的规范或其他技术文档是什么?

推荐答案

不幸的是,权威文档是源代码. Linux的大多数发行版都使用glibc或其分支(例如eglibc).在两者的源代码中,应记录dlopen()的文件如下:

Unfortunately, the authoritative documentation is the source code. Most distributions of Linux use glibc or its fork, eglibc. In the source code for both, the file that should document dlopen() reads as follows:

manual/libdl.texi

manual/libdl.texi

@c FIXME these are undocumented:
@c dladdr
@c dladdr1
@c dlclose
@c dlerror
@c dlinfo
@c dlmopen
@c dlopen
@c dlsym
@c dlvsym

可以从 ELF规范和POSIX标准. ELF规范使弱符号变得有意义. POSIX是dlopen()本身的实际规范.

What technical specification there is can be drawn from the ELF specification and the POSIX standard. The ELF specification is what makes a weak symbol meaningful. POSIX is the actual specification for dlopen() itself.

这是我认为是ELF规范中最相关的部分.

This is what I find to be the most relevant portion of the ELF specification.

当链接编辑器搜索存档库时,它将提取存档 包含未定义的全局符号的定义的成员.这 成员的定义可以是全局符号,也可以是弱符号.

When the link editor searches archive libraries, it extracts archive members that contain definitions of undefined global symbols. The member’s definition may be either a global or a weak symbol.

ELF规范未引用动态加载,因此本段的其余部分由我自己解释.我发现上述相关性的原因是,解析符号出现在单个时间"处.在您给出的示例中,当程序a动态加载b.so时,动态加载器将尝试解析未定义的符号.最终可能会使用全局符号或弱符号来这样做.然后,当程序动态加载c.so时,动态加载器将再次尝试解析未定义的符号.在您描述的情况下,b.so中的符号已使用弱符号解析.一旦解析,这些符号将不再是未定义的.使用全局符号还是弱符号来定义它们都没有关系.在加载c.so时,它们不再是未定义的.

The ELF specification makes no reference to dynamic loading so the rest of this paragraph is my own interpretation. The reason I find the above relevant is that resolving symbols occurs at a single "when". In the example you give, when program a dynamically loads b.so, the dynamic loader attempts to resolve undefined symbols. It may end up doing so with either global or weak symbols. When the program then dynamically loads c.so, the dynamic loader again attempts to resolve undefined symbols. In the scenario you describe, symbols in b.so were resolved with weak symbols. Once resolved, those symbols are no longer undefined. It doesn't matter if global or weak symbols were used to defined them. They're already no longer undefined by the time c.so is loaded.

ELF规范未提供有关链接编辑器是什么或链接编辑器何时必须组合目标文件的精确定义.大概不是问题,因为文档考虑了动态链接.

The ELF specification gives no precise definition of what a link editor is or when the link editor must combine object files. Presumably it's a non-issue because the document has dynamic-linking in mind.

POSIX描述了dlopen()的某些功能,但要留给实现,包括问题的实质. POSIX通常不引用ELF格式或弱符号.对于实现dlopen()的系统,甚至不需要任何弱符号的概念.

POSIX describes some of the dlopen() functionality but leaves much up to the implementation, including the substance of your question. POSIX makes no reference to the ELF format or weak symbols in general. For systems implementing dlopen() there need not even be any notion of weak symbols.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html

POSIX遵从性是另一个标准Linux标准库的一部分. Linux发行版可能会或可能不会选择遵循这些标准,并且可能会也可能不会通过认证.例如,据我了解,Open Group的正式Unix认证非常昂贵-因此有大量的类Unix"系统.

POSIX compliance is part of another standard, the Linux Standard Base. Linux distributions may or may not choose to follow these standards and may or may not go to the trouble of being certified. For example, I understand that a formal Unix certification by Open Group is quite expensive -- hence the abundance of "Unix-like" systems.

用于动态加载的维基百科文章上有关于dlopen()标准符合性的有趣观点一个>. POSIX强制要求的dlopen()返回void *,但是ISO要求的C则表示void *是指向对象的指针,并且这种指针不一定与函数指针兼容.

An interesting point about the standards compliance of dlopen() is made on the Wikipedia article for dynamic loading. dlopen(), as mandated by POSIX, returns a void*, but C, as mandated by ISO, says that a void* is a pointer to an object and such a pointer is not necessarily compatible with a function pointer.

事实仍然是,函数和对象之间的任何转换 指针必须被视为(本质上是不可移植的) 实现扩展,没有直接的正确"方法 存在转换,因为在这方面,POSIX和ISO标准 彼此矛盾.

The fact remains that any conversion between function and object pointers has to be regarded as an (inherently non-portable) implementation extension, and that no "correct" way for a direct conversion exists, since in this regard the POSIX and ISO standards contradict each other.

确实存在的标准相互矛盾,并且其中的标准文件可能并不是特别有意义.这是乌尔里希·德雷珀(Ulrich Drepper)撰写的关于他对Open Group的蔑视及其规范"的文章.

The standards that do exist contradict and what standards documents there are may not be especially meaningful anyway. Here's Ulrich Drepper writing about his disdain for Open Group and their "specifications".

http://udrepper.livejournal.com/8511.html

类似的情感表达在Rodrigo所链接的帖子中.

Similar sentiment is expressed in the post linked by rodrigo.

我做出此更改的原因并不是为了使其更加符合实际 (很好,但是没有理由,因为没有人抱怨旧的 行为).

The reason I've made this change is not really to be more conformant (it's nice but no reason since nobody complained about the old behaviour).

经过研究,我相信正如您所问的那样,该问题的正确答案是dlopen()在这方面没有对或错的行为.可以说,一旦搜索解决了一个符号,它就不再是未定义的,并且在随后的搜索中,动态加载程序将不会尝试解析已经定义的符号.

After looking into it, I believe the proper answer to the question as you've asked it is that there is no right or wrong behavior for dlopen() in this regard. Arguably, once a search has resolved a symbol it is no longer undefined and in subsequent searches the dynamic loader will not attempt to resolve the already defined symbol.

最后,正如您在评论中指出的那样,您在原始帖子中描述的内容是不正确的.动态加载的共享库可用于解析以前动态加载的共享库中的未定义符号.实际上,这不仅限于动态加载的代码中未定义的符号.这是一个示例,其中可执行文件本身具有未定义的符号,该符号可以通过动态加载来解决.

Finally, as you state in the comments, what you describe in the original post is not correct. Dynamically loaded shared libraries can be used to resolve undefined symbols in previously dynamically loaded shared libraries. In fact, this isn't limited to undefined symbols in dynamically loaded code. Here is an example in which the executable itself has an undefined symbol that is resolved through dynamic loading.

main.c

#include <dlfcn.h>

void say_hi(void);

int main(void) {
    void* symbols_b = dlopen("./dyload.so", RTLD_NOW | RTLD_GLOBAL);
    /* uh-oh, forgot to define this function */
    /* better remember to define it in dyload.so */
    say_hi();
    return 0;
}

dyload.c

#include <stdio.h>
void say_hi(void) {
    puts("dyload.so: hi");
}

编译并运行.

gcc-4.8 main -fpic -ldl -Wl,--unresolved-symbols=ignore-all -o main
gcc-4.8 dyload.c -shared -fpic -o dyload.so
$ ./main
dyload.so: hi

请注意,主要可执行文件本身已编译为PIC.

Note that the main executable itself was compiled as PIC.

这篇关于动态加载和弱符号解析度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆