函数指针局部变量的意外值 [英] Unexpected value of a function pointer local variable
问题描述
我做了一些实验,在其中创建了一个指向类型为printf
的函数的指针类型的局部变量.然后我定期调用printf
并按如下方式使用该变量:
#include<stdio.h>
typedef int (*func)(const char*,...);
int main()
{
func x=printf;
printf("%p\n", x);
x("%p\n", x);
return 0;
}
我已经编译了它,并使用gdb查看了main的反汇编,并得到了:
0x000000000000063a <+0>: push %rbp
0x000000000000063b <+1>: mov %rsp,%rbp
0x000000000000063e <+4>: sub $0x10,%rsp
0x0000000000000642 <+8>: mov 0x20098f(%rip),%rax # 0x200fd8
0x0000000000000649 <+15>: mov %rax,-0x8(%rbp)
0x000000000000064d <+19>: mov -0x8(%rbp),%rax
0x0000000000000651 <+23>: mov %rax,%rsi
0x0000000000000654 <+26>: lea 0xb9(%rip),%rdi # 0x714
0x000000000000065b <+33>: mov $0x0,%eax
0x0000000000000660 <+38>: callq 0x520 <printf@plt>
0x0000000000000665 <+43>: mov -0x8(%rbp),%rax
0x0000000000000669 <+47>: mov -0x8(%rbp),%rdx
0x000000000000066d <+51>: mov %rax,%rsi
0x0000000000000670 <+54>: lea 0x9d(%rip),%rdi # 0x714
0x0000000000000677 <+61>: mov $0x0,%eax
0x000000000000067c <+66>: callq *%rdx
0x000000000000067e <+68>: mov $0x0,%eax
0x0000000000000683 <+73>: leaveq
0x0000000000000684 <+74>: retq
对我来说很奇怪的是,调用printf
直接使用plt(如预期的那样),但是使用局部变量调用它使用的是一个完全不同的地址(如您在第4行中所见)程序集的说明,存储在局部变量x中的值不是plt条目的地址.
那怎么可能?并非所有对可执行文件中未定义函数的调用都首先通过plt获得更好的性能和图片代码吗?
(您可以在程序集的第4行中看到,存储在局部变量x中的值不是plt条目的地址)
嗯? 值在反汇编中不可见,仅在其加载位置可见. (实际上,它不会加载指向PLT条目的指针,但是程序集的第4行不会告诉您 1 .)使用objdump -dR
查看动态重定位.
这是使用相对RIP寻址模式的内存负载.在这种情况下,它正在加载指向libc中实际printf
地址的指针.该指针存储在全局偏移表(GOT)中.
要实现此目的,printf
符号将获得早期绑定"而不是惰性动态链接,从而避免了以后使用该函数指针的PLT开销.
注释1:尽管也许您是基于这种事实,而不是相对于RIP的LEA来承担负载.确实可以告诉您,这不是PLT条目;它不是PLT条目. PLT要点的一部分是拥有一个地址,该地址是call rel32
的链接时间常数,这也使LEA具有RIP + rel32寻址模式.如果编译器希望在寄存器中使用PLT地址,则将使用该地址.
顺便说一句,PLT存根本身也将GOT条目用于其内存间接跳转;对于仅用作函数调用目标的符号,GOT条目保留指向PLT存根,指向push
/jmp
指令的指针,该指针调用惰性动态链接器以解析该PLT条目.即更新GOT条目.
并非所有对可执行文件中未定义函数的调用都首先通过plt获得更好的性能
否,PLT通过为每个调用添加额外的间接级别来提高运行时间的性能. gcc -fno-plt
使用早期绑定而不是等待第一个呼叫,因此它可以通过GOT将间接call
内联到每个呼叫站点中.
PLT的存在是为了避免动态链接期间运行时修正call rel32
偏移量.在64位系统上,允许到达2GB以上的地址.并且还支持符号插入.参见 https://www. macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/(在-fno-plt
存在之前编写;基本上就像他所建议的想法之一).>
与早期绑定相比,PLT的延迟绑定可以提高启动性能,但是在高速缓存命中非常重要的现代系统上,在启动过程中一次完成所有符号扫描工作是很好的.
还有图片代码?
您的代码是 PIC,或者实际上是PIE(与位置无关的可执行文件),大多数发行版都将GCC配置为默认执行.
我希望
的PLT条目的地址x
指向printf
如果使用-fno-pie
,则PLT条目的地址是链接时常量,并且在编译时,编译器不知道您是否要静态链接libc.或动态地因此,它使用mov $printf, %eax
将功能指针的地址获取到寄存器中,并且在链接时只能转换为mov $printf@plt, %eax
.
来告诉编译器它绝对不需要通过pie
或-fno-plt
来通过该符号进行GOT操作.
将其保留到链接器在链接时将symbol
转换为symbol@plt
的链接时(如有必要),可使编译器始终使用有效的32位绝对立即数或RIP相对寻址,并且仅对具有以下功能的函数使用PLT间接寻址:原来是在共享库中.但是随后您将获得指向PLT条目的指针,而不是指向最终地址的指针.
如果您使用的是Intel语法,那么在查看asm而不是反汇编时,它将在GCC的输出中为mov rbp, QWORD PTR printf@GOTPCREL[rip]
.
查看编译器输出可为您提供更多的信息,这些信息仅是纯objdump
输出中RIP的数字偏移量. -r
显示重定位符号会有所帮助,但通常编译器输出会更好. (除非您没有看到printf
被重写为printf@plt
)
I have done some experiments in which I created a local variable of type pointer to function that points to printf
. Then I called printf
regularly and using that variable as following:
#include<stdio.h>
typedef int (*func)(const char*,...);
int main()
{
func x=printf;
printf("%p\n", x);
x("%p\n", x);
return 0;
}
I have compiled it and looked at the disassembly of main using gdb and got that:
0x000000000000063a <+0>: push %rbp
0x000000000000063b <+1>: mov %rsp,%rbp
0x000000000000063e <+4>: sub $0x10,%rsp
0x0000000000000642 <+8>: mov 0x20098f(%rip),%rax # 0x200fd8
0x0000000000000649 <+15>: mov %rax,-0x8(%rbp)
0x000000000000064d <+19>: mov -0x8(%rbp),%rax
0x0000000000000651 <+23>: mov %rax,%rsi
0x0000000000000654 <+26>: lea 0xb9(%rip),%rdi # 0x714
0x000000000000065b <+33>: mov $0x0,%eax
0x0000000000000660 <+38>: callq 0x520 <printf@plt>
0x0000000000000665 <+43>: mov -0x8(%rbp),%rax
0x0000000000000669 <+47>: mov -0x8(%rbp),%rdx
0x000000000000066d <+51>: mov %rax,%rsi
0x0000000000000670 <+54>: lea 0x9d(%rip),%rdi # 0x714
0x0000000000000677 <+61>: mov $0x0,%eax
0x000000000000067c <+66>: callq *%rdx
0x000000000000067e <+68>: mov $0x0,%eax
0x0000000000000683 <+73>: leaveq
0x0000000000000684 <+74>: retq
What is weird to me is that calling to printf
directly uses the plt (as expected) but calling it using the local variable uses a whole different address (as you can see in line 4 of the assembly that the value stored in local variable x is not the address of the plt entry).
How can that be? Don't all the calls to functions undefined in the executable go first through the plt for better performance and for pic code?
(as you can see in line 4 of the assembly that the value stored in local variable x is not the address of the plt entry)
Huh? The value isn't visible in the disassembly, only the location it's loaded from. (In practice it's not loading a pointer to the PLT entry, but line 4 of the assembly doesn't tell you that1.) Use objdump -dR
to see dynamic relocations.
That's a load from memory using a RIP-relative addressing mode. In this case it's loading a pointer to the real printf
address in libc. That pointer is stored in the Global Offset Table (GOT).
To make this work, the printf
symbol gets "early binding" instead of lazy dynamic linking, avoiding PLT overhead for later uses of that function pointer.
Footenote 1: Although maybe you were basing that reasoning on the fact that it's a load instead of a RIP-relative LEA. That pretty much does tell you it's not the PLT entry; part of the point of the PLT is to have an address that's a link-time constant for call rel32
, which also enables LEA with a RIP+rel32 addressing mode. The compiler would have used that if it wanted the PLT address in a register.
BTW, the PLT stub itself also uses the GOT entry for its memory-indirect jump; for symbols that are only used as function call targets, the GOT entry holds a pointer back to the PLT stub, to the push
/ jmp
instructions that invoke the lazy dynamic linker to resolve that PLT entry. i.e. to update the GOT entry.
Don't all the calls to functions undefined in the executable go first through the plt for better performance
No, the PLT costs runtime performance by adding an extra level of indirection to every call. gcc -fno-plt
uses early binding instead waiting for the first call, so it can inline the indirect call
through the GOT right into each call site.
The PLT exists to avoid runtime fixups of call rel32
offsets during dynamic linking. And on 64-bit systems, to allow reaching addresses that are more than 2GB away. And also to support symbol interposition. See https://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/ (written before -fno-plt
existed; it's basically like one of the ideas he was suggesting).
The PLT's lazy binding can improve startup performance vs. early binding, but on modern systems where cache hits are very important, doing all the symbol-scanning stuff at once during startup is nice.
and for pic code?
Your code is PIC, or actually PIE (position-independent executable), which most distros configure GCC to do by default.
I expected
x
to point to the address of the PLT entry ofprintf
If you use -fno-pie
, then the address of the PLT entry is a link-time constant, and at compile time the compiler doesn't know whether you're going to link libc statically or dynamically. So it uses mov $printf, %eax
to get the address of a function-pointer into a register, and at link time that can only convert to mov $printf@plt, %eax
.
See it on Godbolt. (The Godbolt default is -fno-pie
, unlike on most current Linux distros.)
# gcc9.2 -O3 -fpie for your first block
movq printf@GOTPCREL(%rip), %rbp
leaq .LC0(%rip), %rdi
xorl %eax, %eax
movq %rbp, %rsi # saved for later in rbp
call printf@PLT
vs.
# gcc9.2 -O3 -fno-pie
movl $printf, %esi # linker converts this symbol reference to printf@plt
movl $.LC0, %edi
xorl %eax, %eax
call printf # will convert at link-time to printf@plt
# next use also just uses mov-immediate to rematerialize, instead of saving a load result in a register.
So a PIE executable actually has better efficiency for repeated-use of function pointers to functions in standard libraries: the pointer is the final address, not just the PLT entry.
-fno-plt -fno-pie
works more like PIE mode for taking function pointers. Except it can still use $foo
32-bit immediates for the addresses of symbols in the same file, instead of a RIP-relative LEA.
# gcc9.2 -O3 -fno-plt -fno-pie
movq printf@GOTPCREL(%rip), %rbp # saved for later in RBP
movl $.LC0, %edi
xorl %eax, %eax
movq %rbp, %rsi
call *printf@GOTPCREL(%rip)
# pointers to static functions can use mov $foo, %esi
It seems you need int foo(const char*,...) __attribute__((visibility("hidden")));
to tell the compiler it definitely doesn't need to go through the GOT for this symbol, with pie
or -fno-plt
.
Leaving it until link-time for the linker to convert symbol
to symbol@plt
if necessary allows the compiler to always use efficient 32-bit absolute immediates or RIP-relative addressing and only end up with PLT indirection for functions that turn out to be in a shared library. But then you end up with pointers to PLT entries, instead of pointers to the final address.
If you were using Intel syntax, it would be mov rbp, QWORD PTR printf@GOTPCREL[rip]
in GCC's output for this, if you look at asm instead of disassembly.
Looking at compiler output gives you significantly more information that just numeric offsets from RIP in plain objdump
output. -r
to show relocation symbols helps some, but compiler output is generally better. (Except you don't see that printf
gets rewritten to printf@plt
)
这篇关于函数指针局部变量的意外值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!