为什么PLT除了GOT之外还存在,而不是仅仅使用GOT? [英] Why does the PLT exist in addition to the GOT, instead of just using the GOT?

查看:28
本文介绍了为什么PLT除了GOT之外还存在,而不是仅仅使用GOT?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道在典型的 ELF 二进制文件中,函数是通过过程链接表 (PLT) 调用的.函数的 PLT 条目通常包含跳转到全局偏移表 (GOT) 条目.这个入口会先引用一些代码,将实际函数地址加载到GOT中,并包含第一次调用(延迟绑定)后的实际函数地址.

I understand that in a typical ELF binary, functions get called through the Procedure Linkage Table (PLT). The PLT entry for a function usually contains a jump to a Global Offset Table (GOT) entry. This entry will first reference some code to load the actual function address into the GOT, and contain the actual function address after the first call (lazy binding).

准确地说,在将 GOT 入口点延迟绑定回 PLT 之前,跳转到 GOT 之后的指令.这些指令通常会跳转到 PLT 的头部,从那里调用一些绑定例程,然后更新 GOT 条目.

To be precise, before lazy binding the GOT entry points back into the PLT, to the instructions following the jump into the GOT. These instructions will usually jump to the head of the PLT, from where some binding routine gets called which will then update the GOT entry.

现在我想知道为什么有两个间接调用(调用 PLT 然后从 GOT 跳转到一个地址),而不是只保留 PLT 并直接从 GOT 调用地址.看起来这可以节省一次跳转和完整的 PLT.当然,您仍然需要一些调用绑定例程的代码,但这可以在 PLT 之外.

Now I'm wondering why there are two indirections (calling into the PLT and then jumping to an address from the GOT), instead of just sparing the PLT and calling the address from the GOT directly. It looks like this could save a jump and the complete PLT. You would of course still need some code calling the binding routine, but this can be outside the PLT.

有什么我遗漏的吗?额外 PLT 的目的是什么?

Is there anything I am missing? What is/was the purpose of an extra PLT?

更新:正如评论中所建议的,我创建了一些(伪)代码 ASCII 艺术来进一步解释我所指的内容:

Update: As suggested in the comments, I created some (pseudo-) code ASCII art to further explain what I'm referring to:

这是目前的PLT方案,据我了解,在延迟绑定之前的情况:(PLT和printf之间的一些间接用..."表示.)

This is the situation, as far as I understand it, in the current PLT scheme before lazy binding: (Some indirections between the PLT and printf are represented by "...".)

Program                PLT                                 printf
+---------------+      +------------------+                +-----+
| ...           |      | push [0x603008]  |<---+       +-->| ... |
| call j_printf |--+   | jmp [0x603010]   |----+--...--+   +-----+
| ...           |  |   | ...              |    |
+---------------+  +-->| jmp [printf@GOT] |-+  |
                       | push 0xf         |<+  |
                       | jmp 0x400da0     |----+
                       | ...              |
                       +------------------+

……以及延迟绑定之后:

… and after lazy binding:

Program                PLT                       printf
+---------------+      +------------------+      +-----+
| ...           |      | push [0x603008]  |  +-->| ... |
| call j_printf |--+   | jmp [0x603010]   |  |   +-----+
| ...           |  |   | ...              |  |
+---------------+  +-->| jmp [printf@GOT] |--+
                       | push 0xf         |
                       | jmp 0x400da0     |
                       | ...              |
                       +------------------+

在我想象的没有 PLT 的替代方案中,延迟绑定之前的情况如下所示:(我将延迟绑定表"中的代码与 PLT 中的代码类似.它也可能看起来不同,我不在乎.)

In my imaginary alternative scheme without a PLT, the situation before lazy binding would look like this: (I kept the code in the "Lazy Binding Table" similar to to the one from the PLT. It could also look differently, I don't care.)

Program                    Lazy Binding Table                printf
+-------------------+      +------------------+              +-----+
| ...               |      | push [0x603008]  |<-+       +-->| ... |
| call [printf@GOT] |--+   | jmp [0x603010]   |--+--...--+   +-----+
| ...               |  |   | ...              |  |
+-------------------+  +-->| push 0xf         |  |
                           | jmp 0x400da0     |--+
                           | ...              |
                           +------------------+

现在在惰性绑定之后,人们将不再使用该表:

Now after the lazy binding, one wouldn't use the table anymore:

Program                   Lazy Binding Table        printf
+-------------------+     +------------------+      +-----+
| ...               |     | push [0x603008]  |  +-->| ... |
| call [printf@GOT] |--+  | jmp [0x603010]   |  |   +-----+
| ...               |  |  | ...              |  |
+-------------------+  |  | push 0xf         |  |
                       |  | jmp 0x400da0     |  |
                       |  | ...              |  |
                       |  +------------------+  |
                       +------------------------+

推荐答案

问题是将 call printf@PLT 替换为 call [printf@GOTPLT] 需要编译器知道函数 printf 存在于共享库中,而不是静态库中(甚至只存在于普通对象文件中).链接器可以将call printf变成call printf@PLTjmp printf变成jmp printf@PLT甚至mov eax, printfmov eax, printf@PLT 因为它所做的一切都是将基于符号 printf 的重定位更改为基于符号的重定位printf@PLT.链接器不能将 call printf 更改为 call [printf@GOTPLT] 因为它不知道从重定位中它是 CALL 还是 JMP 指令或其他什么东西.不知道是不是CALL指令,也不知道是不是应该把操作码从直接CALL改成间接CALL.

The problem is that replacing call printf@PLT with call [printf@GOTPLT] requires that the compiler knows that the function printf exists in a shared library and not a static library (or even in just a plain object file). The linker can change call printf into call printf@PLT, jmp printf into jmp printf@PLT or even mov eax, printf into mov eax, printf@PLT because all it's doing it changing a relocation based on the symbol printf into relocation based on the symbol printf@PLT. The linker can't change call printf into call [printf@GOTPLT] because it doesn't know from the relocation whether it's a CALL or JMP instruction or something else entirely. Without knowing whether it's a CALL instruction or not, it doesn't know whether it should change the opcode from a direct CALL to a indirect CALL.

但是即使有特殊的重定位类型表明指令是CALL,你仍然会遇到直接调用指令是5字节长而间接调用指令是6字节长的问题.编译器必须发出像 nop; 这样的代码.调用 printf@CALL 为链接器提供空间来插入所需的额外字节,并且它必须为对任何全局函数的所有调用执行此操作.由于所有额外的而非实际必要的 NOP 指令,最终可能会导致净性能损失.

However even if there was a special relocation type that indicated that the instruction was a CALL, you still have the problem that a direct call instruction is a 5 bytes long but a indirect call instruction is 6 bytes long. The compiler would have to emit code like nop; call printf@CALL to give the linker room to insert the additional byte needed and it would have to do it for all calls to any global function. It would probably end up being a net performance loss because of all the extra and not actually necessary NOP instructions.

另一个问题是在 32 位 x86 目标上,PLT 条目在运行时重新定位.PLT中的间接jmp [xxx@GOTPLT]指令不像直接CALL和JMP指令那样使用相对寻址,并且由于xxx@GOTPLT的地址依赖于将图像加载到内存中的指令需要修复以使用正确的地址.通过将所有这些间接 JMP 指令组合在一个 .plt 部分中,意味着需要修改的虚拟内存页面数量要少得多.每个被修改的 4K 页面都不能再与其他进程共享,当需要修改的指令分散在整个内存中时,需要很大一部分图像不被共享.

Another problem is that on 32-bit x86 targets the PLT entries are relocated at runtime. The indirect jmp [xxx@GOTPLT] instructions in the PLT don't use relative addressing like the direct CALL and JMP instructions, and since the address of xxx@GOTPLT depends on where the image was loaded in memory the instruction needs to be fixed up to use the correct address. By having all these indirect JMP instructions grouped together in one .plt section means that much smaller number of virtual memory pages need to be modified. Each 4K page that's modified can no longer be shared with other processes, when the instructions that need to modified are scattered all over memory it requires that a much larger part the image to be unshared.

请注意,后面的问题只是共享库和 32 位 x86 目标上的位置独立可执行文件的问题.传统的可执行文件无法重定位,因此无需修复 @GOTPLT 引用,而在 64 位 x86 目标上,RIP 相对寻址用于访问 @GOTPLT 条目.

Note that this later issue is only a problem with shared libraries and position independent executables on 32-bit x86 targets. Traditional executables can't be relocated, so there's no need to fix the @GOTPLT references, while on 64-bit x86 targets RIP relative addressing is used to access the @GOTPLT entries.

因为最后一点,GCC 的新版本(6.1 或更高版本)支持 -fno-plt 标志.在 64 位 x86 目标上,此选项会导致编译器生成 call printf@GOTPCREL[rip] 指令而不是 call printf 指令.但是,对于未在同一编译单元中定义的函数的任何调用,它似乎都是这样做的.那是它不确定的任何函数都没有在共享库中定义.这意味着间接跳转也将用于调用其他目标文件或静态库中定义的函数.在 32 位 x86 目标上,除非编译位置无关代码(-fpic-fpie),否则将忽略 -fno-plt 选项在发出的 call printf@GOT[ebx] 指令中.除了产生不必要的间接跳转之外,这也有一个缺点,即需要为 GOT 指针分配一个寄存器,尽管大多数函数无论如何都需要分配它.

Because of that last point new versions of a GCC (6.1 or later) support the -fno-plt flag. On 64-bit x86 targets this option causes the compiler to generate call printf@GOTPCREL[rip] instructions instead of call printf instructions. However it appears to do this for any call to a function that isn't defined in the same compilation unit. That is any function it doesn't know for sure isn't defined in shared library. That would mean that indirect jumps would also be used for calls to functions defined in other object files or static libraries. On 32-bit x86 targets the -fno-plt option is ignored unless compiling position independent code (-fpic or -fpie) where it results in call printf@GOT[ebx] instructions being emitted. In addition to generating unnecessary indirect jumps, this also has the disadvantage of requiring the allocation of a register for the GOT pointer though most functions would need it allocated anyways.

最后,Windows 能够通过在头文件中使用dllimport"属性声明符号来执行您的建议,表明它们存在于 DLL 中.这样编译器就知道在调用函数时是生成直接调用指令还是间接调用指令.这样做的缺点是符号必须存在于 DLL 中,因此如果使用此属性,您无法在编译后决定链接到静态库.

Finally, Windows is able to do what you suggest by declaring symbols in header files with the "dllimport" attribute, indicating that they exist in DLLs. This way the compiler knows whether or not to generate direct or indirect call instruction when calling the function. The disadvantage of this is that the symbol has to exist in a DLL, so if this attribute used is you can't decide after compilation to link with a static library instead.

另请阅读 Drepper 的如何编写共享库论文,它详细地解释了这一点(对于 Linux).

Read also Drepper's How to write a shared library paper, it explains that quite well in details (for Linux).

这篇关于为什么PLT除了GOT之外还存在,而不是仅仅使用GOT?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆