x86-64 Linux中不再允许32位绝对地址? [英] 32-bit absolute addresses no longer allowed in x86-64 Linux?

查看:1812
本文介绍了x86-64 Linux中不再允许32位绝对地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

64位Linux默认使用小内存模式,所有代码和静态数据都低于2GB的地址限制。这确保您可以使用32位绝对地址。老版本的gcc为静态数组使用32位绝对地址,以便为相对地址计算节省额外的指令。但是,这不再有效。如果我尝试在程序集中创建一个32位绝对地址,则会出现链接器错误:
在创建共享对象时无法使用针对`.data'的R_X86_64_32S;使用-fPIC重新编译。
这个错误信息当然是误导性的,因为我没有创建共享对象,-fPIC也没有帮助。
到目前为止我发现的是:gcc版本4.8.5使用静态数组的32位绝对地址,gcc版本6.3.0不使用。版本5可能不会。 binutils 2.24中的链接器允许使用32位绝对地址,而2.28版本则不允许。



这种改变的结果是旧的库必须重新编译,并且遗留的汇编代码被打破。

现在我想问一下:这个改变是什么时候发生的?它是否记录在某处?有没有一个链接器选项,使它接受32位绝对地址? 你的发行版配置了gcc与 - enable-default-pie ,所以默认情况下它会创建与位置无关的可执行文件(允许执行程序的ASLR以及库)。 ELF共享对象不允许使用32位绝对重定位,因为这会阻止它们在低4GiB之外加载。



64位绝对地址仍然存在允许在Linux ELF共享对象中使用文本重定位,以允许在不同的地址(ASLR和共享库)中进行加载。






code> mov rdi,qword msg 作品(NASM / YASM语法,用于10字节 mov r64,imm64 ,又名AT& T语法 movabs ,唯一的指令使用64位立即)。但是这比较大,通常比 lea rdi,[rel msg] 慢,如果您决定不禁用 -pie 。您可以使用 default rel ,而不是在每个 [symbol] 寻址模式中指定它。



在位置相关代码中( -no-pie ),您应该使用 mov edi,msg 当你想要一个寄存器中的地址时; 5字节 mov r32,imm32 更小,并且运行在比RIP相对LEA更多的端口上。




使用 gcc -fno-pie -no-pie 可以覆盖旧的行为。 -no-pie 是链接器选项, -fno-pie 是代码生成选项。如果只有 -fno-pie ,gcc将会使代码像 mov eax,offset .LC0 那样没有链接仍然启用 -pie



clang )可以默认启用PIE : clang -fno-pie -nopie 2017年7月补丁 -no-pie -nopie 的别名,对于与gcc兼容,但clang4.0.1没有它。)






只有<$编译器生成的代码(来自C或C ++源代码)的c $ c> -no-pie (但仍然是 -fpie )会稍微慢一些和大于所需的,但仍将链接到不受益于ASLR的依赖于位置的可执行文件。对于可执行文件的惩罚主要是针对像索引静态数组这样的东西,正如Agner在问题中所描述的那样,使用静态地址作为32位立即数或作为 [disp32 + index * 4]的一部分寻址模式保存指令和寄存器与RIP相关的LEA以将地址存入寄存器。对于每种情况,还需要5字节 mov r32,imm32 而不是7字节 lea r64,[rel symbol] 将字符串文字或其他静态数据的地址传递给函数。

-fPIE 仍然没有符号 - 不同于 -fPIC ,对于必须通过GOT访问全局变量的共享库(这是另一个使用静态用于任何可以限制为文件范围而不是全局范围的变量)。请参阅动态库的抱歉状态在Linux上



因此 -fPIE 远不如差 - 针对64位代码的fPIC ,但仍然对32位不好,因为RIP相对寻址不可用。参见关于Godbolt编译器的一些例子探险家。平均而言, -fPIE 在64位代码中具有非常小的性能/代码尺寸下降。特定循环的最坏情况可能只有几个百分点。但是32位PIE可能会更糟。



这些 -f 代码生成选项都没有任何区别当只是链接时,
或者当组装 .S 手写asm。 gcc -fno-pie -no-pie -O3 main.c nasm_output.o 是您想要这两个选项的情况。






如果你的GCC是这样配置的, gcc -v |& grep -o -e'[^] * pie'打印 - enable-default-pie 。在 2015年初发布的版本中,对gcc添加了对此配置选项的支持,一>。 Ubuntu在16.10和Debian几乎同时在gcc 6.2.0-7 (导致内核构建错误: https://lkml.org/lkml/2016/10/21/904 )。相关:将压缩的x86内核构建为PIE 。另请参阅: https:// security .stackexchange.com / questions / 41697 / why-does-linux-randomize-the-address-of-the-executable-code-segment




请注意, ld 本身并未更改其默认值。它仍然正常工作(至少在Arch Linux上使用binutils 2.28)。改变是 gcc 默认为传递 -pie 作为链接器选项,除非您明确使用 -static -no-pie



在NASM源文件中,我用 a32 mov eax,[abs buf] 来获取绝对地址。 (我正在测试编码小绝对地址(地址大小+ mov eax,moffs: 67 a1 40 f1 60 00 )的6字节方式是否具有LCP失速英特尔CPU。。)

  nasm -felf64 -Worphan-labels -g -Fdwarf testloop.asm&& 
ld -o testloop testloop.o#works

gcc -v -nostdlib testloop.o#不起作用
...
.... / collect2 ... -pie ...
/ usr / bin / ld:testloop.o:重定位在制作共享对象时,无法使用针对`.bss'的R_X86_64_32;使用-fPIC
重新编译/ usr / bin / ld:最终链接失败:输出上的非代表性部分
collect2:错误:ld返回1退出状态

gcc -v -no- pie -nostdlib testloop.o#works
gcc -v -static -nostdlib testloop.o#也可以工作:-static暗示-no-pie

相关:使用/不使用libc构建静态/动态可执行文件,定义 _start main




检查现有的可执行文件是否为PIE



file readelf 表示PIE是共享对象,而不是ELF可执行文件。静态可执行文件不能是PIE。

  $ gcc -fno-pie -no-pie -O3 hello.c 
$ file a.out
a.out:ELF 64位LSB可执行文件,...

$ gcc -O3 hello.c
$文件a.out
a.out:ELF 64位LSB共享对象,...

在: https://unix.stackexchange。 com / questions / 89211 / test-whether-linux-binary-is-compiled-as-position-independent-code




半相关(但不是真的):另一个最近的gcc功能是 gcc -fno-plt 。最后,调用共享库可以是调用[rip + symbol @ GOTPCREL] (AT& T call * puts @ GOTPCREL(%rip) code>),没有PLT蹦床。

Distros希望能尽快启用它,因为它也避免了需要可写入的可执行内存页面。对于进行大量共享库调用的程序来说,这是一个显着的提速。 x86-64 clang -O2 -g 编译tramp3d在任何硬件上从41.6s到36.8s 补丁作者测试。 (clang可能是共享库调用的最坏情况)。

它需要早期绑定而不是懒惰的动态链接,所以对于退出右侧的大型程序远。 (例如 clang --version 或编译 hello.c )。尽管如此,这种放缓可以通过预先链接来降低。



尽管如此,这并没有消除共享库PIC代码中外部变量的GOT开销。 (见上面的godbolt链接)。


64 bit Linux uses the small memory model by default, which puts all code and static data below the 2GB address limit. This makes sure that you can use 32-bit absolute addresses. Older versions of gcc use 32-bit absolute addresses for static arrays in order to save an extra instruction for relative address calculation. However, this no longer works. If I try to make a 32-bit absolute address in assembly, I get the linker error: "relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC". This error message is misleading, of course, because I am not making a shared object and -fPIC doesn't help. What I have found out so far is this: gcc version 4.8.5 uses 32-bit absolute addresses for static arrays, gcc version 6.3.0 doesn't. version 5 probably doesn't either. The linker in binutils 2.24 allows 32-bit absolute addresses, verson 2.28 does not.

The consequence of this change is that old libraries have to be recompiled and legacy assembly code is broken.

Now I want to ask: When was this change made? Is it documented somewhere? And is there a linker option that makes it accept 32-bit absolute addresses?

解决方案

Your distro configured gcc with --enable-default-pie, so it's making position-independent executables by default, (allowing for ASLR of the executable as well as libraries). 32-bit absolute relocation aren't allowed in an ELF shared object, because that would stop them from being loaded outside the low 4GiB.

64-bit absolute addresses are still allowed in Linux ELF shared objects, with text relocations to allow loading at different addresses (ASLR and shared libraries).


So mov rdi, qword msg works (NASM/YASM syntax for 10-byte mov r64, imm64, aka AT&T syntax movabs, the only instruction which can use a 64-bit immediate). But that's larger and usually slower than lea rdi, [rel msg], which is what you should use if you decide not to disable -pie. You can use default rel instead of specifying it in every [symbol] addressing mode.

In position-dependent code (-no-pie), you should use mov edi, msg when you want an address in a register; 5-byte mov r32, imm32 is even smaller and runs on more ports than RIP-relative LEA.


Use gcc -fno-pie -no-pie to override this back to the old behaviour. -no-pie is the linker option, -fno-pie is the code-gen option. With only -fno-pie, gcc will make code like mov eax, offset .LC0 that doesn't link with the still-enabled -pie.

(clang can have PIE enabled by default, too: use clang -fno-pie -nopie. A July 2017 patch made -no-pie an alias for -nopie, for compat with gcc, but clang4.0.1 doesn't have it.)


With only -no-pie, (but still -fpie) compiler-generated code (from C or C++ sources) will be slightly slower and larger than necessary, but will still be linked into a position-dependent executable which won't benefit from ASLR. The penalty for executables is mostly for stuff like indexing static arrays, as Agner describes in the question, were using a static address as a 32-bit immediate or as part of a [disp32 + index*4] addressing mode saves instructions and registers vs. a RIP-relative LEA to get an address into a register. Also 5-byte mov r32, imm32 instead of 7-byte lea r64, [rel symbol] for every case like passing the address of a string literal or other static data to a function.

-fPIE still assumes no symbol-interposition for global variables / functions, unlike -fPIC for shared libraries which have to go through the GOT to access globals (which is yet another reason to use static for any variables that can be limited to file scope instead of global). See The sorry state of dynamic libraries on Linux.

Thus -fPIE is much less bad than -fPIC for 64-bit code, but still bad for 32-bit because RIP-relative addressing isn't available. See some examples on the Godbolt compiler explorer. On average, -fPIE has a very small performance / code-size downside in 64-bit code. The worst case for a specific loop might only be a few %. But 32-bit PIE can be much worse.

None of these -f code-gen options make any difference when just linking, or when assembling .S hand-written asm. gcc -fno-pie -no-pie -O3 main.c nasm_output.o is a case where you want both options.


If your GCC was configured this way, gcc -v |& grep -o -e '[^ ]*pie' prints --enable-default-pie. Support for this config option was added to gcc in early 2015. Ubuntu enabled it in 16.10, and Debian around the same time in gcc 6.2.0-7 (leading to kernel build errors: https://lkml.org/lkml/2016/10/21/904). Related: Build compressed x86 kernels as PIE. See also: https://security.stackexchange.com/questions/41697/why-doesnt-linux-randomize-the-address-of-the-executable-code-segment


Note that ld itself didn't change its default. It still works normally (at least on Arch Linux with binutils 2.28). The change is that gcc defaults to passing -pie as a linker option, unless you explicitly use -static or -no-pie.

In a NASM source file, I used a32 mov eax, [abs buf] to get an absolute address. (I was testing if the 6-byte way to encode small absolute addresses (address-size + mov eax,moffs: 67 a1 40 f1 60 00) has an LCP stall on Intel CPUs. It does.)

nasm -felf64 -Worphan-labels -g -Fdwarf testloop.asm &&
ld -o testloop testloop.o              # works

gcc -v -nostdlib testloop.o            # doesn't work
...
..../collect2  ... -pie ...
/usr/bin/ld: testloop.o: relocation R_X86_64_32 against `.bss' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

gcc -v -no-pie -nostdlib testloop.o    # works
gcc -v -static -nostdlib testloop.o    # also works: -static implies -no-pie

related: building static / dynamic executables with/without libc, defining _start or main.


Checking if an existing executable is PIE or not

file and readelf say that PIEs are "shared objects", not ELF executables. Static executables can't be PIE.

$ gcc -fno-pie  -no-pie -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB executable, ...

$ gcc -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB shared object, ...

This has been asked at: https://unix.stackexchange.com/questions/89211/test-whether-linux-binary-is-compiled-as-position-independent-code


Semi-related (but not really): another recent gcc feature is gcc -fno-plt. Finally calls into shared libraries can be just call [rip + symbol@GOTPCREL] (AT&T call *puts@GOTPCREL(%rip)), with no PLT trampoline.

Distros will hopefully start enabling it soon, because it also avoids needing writeable + executable memory pages. It's a significant speedup for programs that make a lot of shared-library calls, e.g. x86-64 clang -O2 -g compiling tramp3d goes from 41.6s to 36.8s on whatever hardware the patch author tested on. (clang is maybe a worst-case scenario for share library calls.)

It does require early binding instead of lazy dynamic linking, so it's slower for big programs that exit right away. (e.g. clang --version or compiling hello.c). This slowdown could be reduced with prelink, apparently.

This doesn't remove the GOT overhead for external variables in shared library PIC code, though. (See the godbolt link above).

这篇关于x86-64 Linux中不再允许32位绝对地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆