x86-64 Linux 中不再允许 32 位绝对地址? [英] 32-bit absolute addresses no longer allowed in x86-64 Linux?

查看:48
本文介绍了x86-64 Linux 中不再允许 32 位绝对地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

64 位 Linux 默认使用小内存模型,将所有代码和静态数据置于 2GB 地址限制以下.这确保您可以使用 32 位绝对地址.旧版本的 gcc 对静态数组使用 32 位绝对地址,以便为相对地址计算节省额外的指令.但是,这不再有效.如果我尝试在汇编中创建 32 位绝对地址,则会出现链接器错误:创建共享对象时不能使用针对‘.data’的重定位 R_X86_64_32S;使用 -fPIC 重新编译".当然,此错误消息具有误导性,因为我没有创建共享对象并且 -fPIC 没有帮助.到目前为止我发现的是:gcc 4.8.5 版对静态数组使用 32 位绝对地址,gcc 6.3.0 版没有.版本 5 可能也没有.binutils 2.24 中的链接器允许 32 位绝对地址,版本 2.28 不允许.

64 bit Linux uses the small memory model by default, which puts all code and static data below the 2GB address limit. This makes sure that you can use 32-bit absolute addresses. Older versions of gcc use 32-bit absolute addresses for static arrays in order to save an extra instruction for relative address calculation. However, this no longer works. If I try to make a 32-bit absolute address in assembly, I get the linker error: "relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC". This error message is misleading, of course, because I am not making a shared object and -fPIC doesn't help. What I have found out so far is this: gcc version 4.8.5 uses 32-bit absolute addresses for static arrays, gcc version 6.3.0 doesn't. version 5 probably doesn't either. The linker in binutils 2.24 allows 32-bit absolute addresses, verson 2.28 does not.

这种变化的结果是必须重新编译旧的库并且破坏遗留的汇编代码.

The consequence of this change is that old libraries have to be recompiled and legacy assembly code is broken.

现在我想问:这个变化是什么时候做出的?它在某处记录了吗?是否有链接器选项使其接受 32 位绝对地址?

Now I want to ask: When was this change made? Is it documented somewhere? And is there a linker option that makes it accept 32-bit absolute addresses?

推荐答案

您的发行版使用 --enable-default-pie 配置了 gcc,因此默认情况下它会生成与位置无关的可执行文件,(允许可执行文件和库的 ASLR).如今,大多数发行版都在这样做.

Your distro configured gcc with --enable-default-pie, so it's making position-independent executables by default, (allowing for ASLR of the executable as well as libraries). Most distros are doing that, these days.

您实际上正在创建一个共享对象:PIE 可执行文件是一种使用带有入口点的共享对象的黑客.动态链接器已经支持这一点,而且 ASLR 对安全性很好,所以这是为可执行文件实现 ASLR 的最简单方法.

You actually are making a shared object: PIE executables are sort of a hack using a shared object with an entry-point. The dynamic linker already supported this, and ASLR is nice for security, so this was the easiest way to implement ASLR for executables.

ELF 共享对象中不允许 32 位绝对重定位;这将阻止它们被加载到低 2GiB 之外(对于符号扩展的 32 位地址).允许使用 64 位绝对地址,但通常您只希望将其用于跳转表或其他静态数据,而不是作为指令的一部分.1

32-bit absolute relocation aren't allowed in an ELF shared object; that would stop them from being loaded outside the low 2GiB (for sign-extended 32-bit addresses). 64-bit absolute addresses are allowed, but generally you only want that for jump tables or other static data, not as part of instructions.1

错误信息的recompile with -fPIC 部分对于手写asm 来说是假的;它是为人们使用 gcc -c 编译然后尝试与 gcc -shared -o foo.so *.o 链接的情况编写的,使用 gcc where -fPIE 不是默认值.错误消息可能应该更改,因为许多人在链接手写 asm 时遇到此错误.

The recompile with -fPIC part of the error message is bogus for hand-written asm; it's written for the case of people compiling with gcc -c and then trying to link with gcc -shared -o foo.so *.o, with a gcc where -fPIE is not the default. The error message should probably change because many people are running into this error when linking hand-written asm.

对于没有缺点的简单情况,始终使用 RIP 相对寻址.另见下面的脚注 1 和 这个语法答案.仅当 32 位绝对寻址实际上对代码大小有帮助而不是有害时才考虑使用.例如NASM default rel 在您的文件顶部.

Always use RIP-relative addressing for simple cases where there's no downside. See also footnote 1 below and this answer for syntax. Only consider using 32-bit absolute addressing when it's actually helpful for code-size instead of harmful. e.g. NASM default rel at the top of your file.

AT&T foo(%rip) 或在 GAS .intel_syntax noprefix 中使用 [rip + foo].

使用 gcc -fno-pie -no-pie 将其覆盖回旧行为. -no-pie 是链接器选项,-fno-pie 是代码-生成选项.仅使用 -fno-pie,gcc 会生成类似 mov eax, offset .LC0 的代码,这些代码不会与仍然启用的 -pie 链接>.

Use gcc -fno-pie -no-pie to override this back to the old behaviour. -no-pie is the linker option, -fno-pie is the code-gen option. With only -fno-pie, gcc will make code like mov eax, offset .LC0 that doesn't link with the still-enabled -pie.

(clang 也可以默认启用 PIE:使用 clang -fno-pie -nopie.A 2017 年 7 月补丁使 -no-pie 成为 -nopie 的别名,与 gcc 兼容,但 clang4.0.1 没有.)

(clang can have PIE enabled by default, too: use clang -fno-pie -nopie. A July 2017 patch made -no-pie an alias for -nopie, for compat with gcc, but clang4.0.1 doesn't have it.)

仅使用 -no-pie,(但仍然是 -fpie)编译器生成的代码(来自 C 或 C++ 源代码)会稍微慢一些,而且更大不必要,但仍会链接到位置相关的可执行文件中,该可执行文件不会从 ASLR 中受益.过多的 PIE 对性能不利";报告平均放缓SPEC CPU2006 上 x86-64 的 3% (我没有论文的副本,所以 IDK 上的硬件是什么:/).但在 32 位代码中,平均减速为 10%,最坏情况为 25%(在 SPEC CPU2006 上).

With only -no-pie, (but still -fpie) compiler-generated code (from C or C++ sources) will be slightly slower and larger than necessary, but will still be linked into a position-dependent executable which won't benefit from ASLR. "Too much PIE is bad for performance" reports an average slowdown of 3% for x86-64 on SPEC CPU2006 (I don't have a copy of the paper so IDK what hardware that was on :/). But in 32-bit code, the average slowdown is 10%, worst-case 25% (on SPEC CPU2006).

PIE 可执行文件的惩罚主要是因为索引静态数组之类的东西,正如 Agner 在问题中所描述的那样,其中使用静态地址作为 32 位立即数或作为 [disp32 + index*4] 的一部分 寻址模式保存指令和寄存器,而不是相对于 RIP 的 LEA 来获取地址到寄存器中.用于将静态地址放入寄存器的 5 字节 mov r32, imm32 而不是 7 字节 lea r64, [rel 符号]字符串文字或其他静态数据到函数.

The penalty for PIE executables is mostly for stuff like indexing static arrays, as Agner describes in the question, where using a static address as a 32-bit immediate or as part of a [disp32 + index*4] addressing mode saves instructions and registers vs. a RIP-relative LEA to get an address into a register. Also 5-byte mov r32, imm32 instead of 7-byte lea r64, [rel symbol] for getting a static address into a register is nice for passing the address of a string literal or other static data to a function.

-fPIE 仍然假设全局变量/函数没有符号插入,不像共享库的 -fPIC 必须通过 GOT 访问全局变量(这是将 static 用于任何可以限制在文件范围而不是全局范围内的变量的另一个原因).见 Linux 上动态库的遗憾状态.

-fPIE still assumes no symbol-interposition for global variables / functions, unlike -fPIC for shared libraries which have to go through the GOT to access globals (which is yet another reason to use static for any variables that can be limited to file scope instead of global). See The sorry state of dynamic libraries on Linux.

因此,对于 64 位代码,-fPIE-fPIC 差得多,但对于 32 位代码仍然差,因为 RIP 相对寻址不是t 可用.见 Godbolt 编译器浏览器上的一些示例.平均而言,-fPIE 在 64 位代码中具有非常小的性能/代码大小缺点.特定循环的最坏情况可能只有几个百分点.但 32 位 PIE 可能会更糟.

Thus -fPIE is much less bad than -fPIC for 64-bit code, but still bad for 32-bit because RIP-relative addressing isn't available. See some examples on the Godbolt compiler explorer. On average, -fPIE has a very small performance / code-size downside in 64-bit code. The worst case for a specific loop might only be a few %. But 32-bit PIE can be much worse.

这些 -f 代码生成选项在链接时没有任何区别,或者在组装 .S 手写 asm 时.gcc -fno-pie -no-pie -O3 main.c nasm_output.o 是您需要两种选择的情况.

None of these -f code-gen options make any difference when just linking, or when assembling .S hand-written asm. gcc -fno-pie -no-pie -O3 main.c nasm_output.o is a case where you want both options.

如果你的 GCC 是这样配置的,gcc -v |&grep -o -e '[^ ]*pie' 打印 --enable-default-pie.2015 年初.Ubuntu 在 16.10 中启用它,而 Debian 在 gcc 6.2.0-7 中几乎同时启用(导致内核构建错误:https://lkml.org/lkml/2016/10/21/904).

If your GCC was configured this way, gcc -v |& grep -o -e '[^ ]*pie' prints --enable-default-pie. Support for this config option was added to gcc in early 2015. Ubuntu enabled it in 16.10, and Debian around the same time in gcc 6.2.0-7 (leading to kernel build errors: https://lkml.org/lkml/2016/10/21/904).

相关:将压缩的 x86 内核构建为 PIE 也受到更改的影响默认.

Related: Build compressed x86 kernels as PIE was also affected by the changed default.

为什么 Linux 没有随机化可执行代码段的地址? 是一个较旧的问题,为什么它不是早期的默认设置,或者在全面启用之前仅在较旧的 Ubuntu 上为几个包启用.

Why doesn't Linux randomize the address of the executable code segment? is an older question about why it wasn't the default earlier, or was only enabled for a few packages on older Ubuntu before it was enabled across the board.

请注意,ld 本身并未更改其默认值.它仍然正常工作(至少在带有 binutils 2.28 的 Arch Linux 上).变化是 gcc 默认将 -pie 作为链接器选项传递,除非您明确使用 -static-no-pie.

Note that ld itself didn't change its default. It still works normally (at least on Arch Linux with binutils 2.28). The change is that gcc defaults to passing -pie as a linker option, unless you explicitly use -static or -no-pie.

在 NASM 源文件中,我使用 a32 mov eax, [abs buf] 来获取绝对地址.(我正在测试编码小绝对地址的 6 字节方式(地址大小 + mov eax,moffs: 67 a1 40 f1 60 00)是否在 Intel CPU 上有 LCP 停顿.确实如此.)

In a NASM source file, I used a32 mov eax, [abs buf] to get an absolute address. (I was testing if the 6-byte way to encode small absolute addresses (address-size + mov eax,moffs: 67 a1 40 f1 60 00) has an LCP stall on Intel CPUs. It does.)

nasm -felf64 -Worphan-labels -g -Fdwarf testloop.asm &&
ld -o testloop testloop.o              # works: static executable

gcc -v -nostdlib testloop.o            # doesn't work
...
..../collect2  ... -pie ...
/usr/bin/ld: testloop.o: relocation R_X86_64_32 against `.bss' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

gcc -v -no-pie -nostdlib testloop.o    # works
gcc -v -static -nostdlib testloop.o    # also works: -static implies -no-pie

GCC 也可以制作静态 PIE"使用 -static-pie;ASLR 没有动态库或 ELF 解释器.与 -static -pie 不同 - 尽管 可能会更改.

GCC can also make a "static PIE" with -static-pie; ASLRed by no dynamic libraries or ELF interpreter. Not the same thing as -static -pie - those conflict with each other (you get a static non-PIE) although it might possibly get changed.

相关:构建静态/带/不带 libc 的动态可执行文件,定义 _startmain.

related: building static / dynamic executables with/without libc, defining _start or main.

这也被问到:如何测试 Linux 二进制文件是否被编译为位置无关代码?

filereadelf 表示 PIE 是共享对象",而不是 ELF 可执行文件.ELF 类型的 EXEC 不能是 PIE.

file and readelf say that PIEs are "shared objects", not ELF executables. ELF-type EXEC can't be PIE.

$ gcc -fno-pie  -no-pie -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB executable, ...

$ gcc -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB shared object, ...

 ## Or with a more recent version of file:
a.out: ELF 64-bit LSB pie executable, ...

gcc -static-pie 是 GCC 默认不做的特殊事情,即使使用 -nostdlib.它显示为LSB pie executable动态链接file的当前版本.(参见 静态链接"之间有什么区别; 以及来自 Linux ldd 的不是动态可执行文件"?).它有 ELF 类型的 DYN,但是 readelf 没有显示 .interp,并且 ldd 会告诉你它是静态链接的.GDB starti/proc/maps 确认执行从其 _start 的顶部开始,而不是在 ELF 解释器中.

gcc -static-pie is a special thing that GCC doesn't do by default, even with -nostdlib. It shows up as LSB pie executable, dynamically linked with current versions of file. (See What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd?). It has ELF-type DYN, but readelf shows no .interp, and ldd will tell you it's statically linked. GDB starti and /proc/maps confirms that execution starts at the top of its _start, not in an ELF interpreter.

半相关(但不是真的):另一个最近的 gcc 特性是 gcc -fno-plt.最后调用共享库可以只是 call [rip + symbol@GOTPCREL] (AT&T call *puts@GOTPCREL(%rip)),没有 PLT 蹦床.

Semi-related (but not really): another recent gcc feature is gcc -fno-plt. Finally calls into shared libraries can be just call [rip + symbol@GOTPCREL] (AT&T call *puts@GOTPCREL(%rip)), with no PLT trampoline.

这个 NASM 版本是 call [rel puts wrt ..got]
作为 call puts wrt ..plt 的替代方法.见 不能从汇编 (yasm) 代码中调用 64 位 Linux 上的 C 标准库函数.这适用于 PIE 或非 PIE,并避免让链接器为您构建 PLT 存根.

The NASM version of this is call [rel puts wrt ..got]
as an alternative to call puts wrt ..plt. See Can't call C standard library function on 64-bit Linux from assembly (yasm) code. This works in a PIE or non-PIE, and avoids having the linker build a PLT stub for you.

一些发行版已经开始启用它.它还避免了需要可写 + 可执行的内存页面,因此有利于防止代码注入的安全性.(我认为现代 PLT 实现也不需要那个,只需更新 GOT 指针而不重写 jmp rel32 指令,因此可能没有安全差异.)

Some distros have started enabling it. It also avoids needing writeable + executable memory pages so it's good for security against code-injection. (I think modern PLT implementation's don't need that either, just updating a GOT pointer not rewriting a jmp rel32 instruction, so there might not be a security difference.)

对于进行大量共享库调用的程序来说,这是一个显着的加速,例如x86-64 clang -O2 -g 在任何硬件上编译 tramp3d 从 41.6s 到 36.8s 补丁作者测试.(clang 可能是共享库调用的最坏情况,对小型 LLVM 库函数进行了大量调用.)

It's a significant speedup for programs that make a lot of shared-library calls, e.g. x86-64 clang -O2 -g compiling tramp3d goes from 41.6s to 36.8s on whatever hardware the patch author tested on. (clang is maybe a worst-case scenario for shared library calls, making lots of calls to small LLVM library functions.)

它确实需要提前绑定而不是懒惰的动态链接,因此对于立即退出的大程序来说它会更慢.(例如 clang --version 或编译 hello.c).显然,预链接可以减少这种放缓.

It does require early binding instead of lazy dynamic linking, so it's slower for big programs that exit right away. (e.g. clang --version or compiling hello.c). This slowdown could be reduced with prelink, apparently.

不过,这不会消除共享库 PIC 代码中外部变量的 GOT 开销.(请参阅上面的 Godbolt 链接).

This doesn't remove the GOT overhead for external variables in shared library PIC code, though. (See the godbolt link above).

脚注 1

Linux ELF 共享对象实际上允许 64 位绝对地址,文本重定位 以允许在不同地址(ASLR 和共享库)加载.这允许您在 section .rodatastatic const int *foo = &bar; 中有跳转表,而无需运行时初始值设定项.

64-bit absolute addresses actually are allowed in Linux ELF shared objects, with text relocations to allow loading at different addresses (ASLR and shared libraries). This allows you to have jump tables in section .rodata, or static const int *foo = &bar; without a runtime initializer.

所以 mov rdi, qword msg 有效(10 字节的 NASM/YASM 语法 mov r64, imm64,又名 AT&T 语法 movabs,唯一可以使用 64 位立即数的指令).但这比 lea rdi, [rel msg] 更大且通常更慢,如果您决定不禁用 -pie,您应该使用它.根据 Agner Fog 的 microarch pdf.(是的,就是问这个问题的那个人.:)

So mov rdi, qword msg works (NASM/YASM syntax for 10-byte mov r64, imm64, aka AT&T syntax movabs, the only instruction which can use a 64-bit immediate). But that's larger and usually slower than lea rdi, [rel msg], which is what you should use if you decide not to disable -pie. A 64-bit immediate is slower to fetch from the uop cache on Sandybridge-family CPUs, according to Agner Fog's microarch pdf. (Yes, the same person who asked this question. :)

您可以使用 NASM 的 default rel 而不是在每个 [rel symbol] 寻址模式中指定它.另见 Mach-O 64 位格式不支持 32 位绝对地址.NASM 访问数组 了解更多关于避免 32 位绝对寻址的描述.OS X 根本无法使用 32 位地址,因此相对于 RIP 的寻址也是最好的方法.

You can use NASM's default rel instead of specifying it in every [rel symbol] addressing mode. See also Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array for some more description of avoiding 32-bit absolute addressing. OS X can't use 32-bit addresses at all, so RIP-relative addressing is the best way there, too.

在位置相关的代码(-no-pie)中,当你想要一个地址时,你应该使用 mov edi, msg登记;5-byte mov r32, imm32 比RIP-relative LEA还要小,可以运行更多的执行端口.

In position-dependent code (-no-pie), you should use mov edi, msg when you want an address in a register; 5-byte mov r32, imm32 is even smaller than RIP-relative LEA, and more execution ports can run it.

这篇关于x86-64 Linux 中不再允许 32 位绝对地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆