x86-64 Linux中不再允许32位绝对地址? [英] 32-bit absolute addresses no longer allowed in x86-64 Linux?
问题描述
64位Linux默认使用小内存模式,所有代码和静态数据都低于2GB的地址限制。这确保您可以使用32位绝对地址。老版本的gcc为静态数组使用32位绝对地址,以便为相对地址计算节省额外的指令。但是,这不再有效。如果我尝试在程序集中创建一个32位绝对地址,则会出现链接器错误:
在创建共享对象时无法使用针对`.data'的R_X86_64_32S;使用-fPIC重新编译。
这个错误信息当然是误导性的,因为我没有创建共享对象,-fPIC也没有帮助。
到目前为止我发现的是:gcc版本4.8.5使用静态数组的32位绝对地址,gcc版本6.3.0不使用。版本5可能不会。 binutils 2.24中的链接器允许使用32位绝对地址,而2.28版本则不允许。
这种改变的结果是旧的库必须重新编译,并且遗留的汇编代码被打破。
现在我想问一下:这个改变是什么时候发生的?它是否记录在某处?有没有一个链接器选项,使它接受32位绝对地址? 你的发行版配置了gcc与 - enable-default-pie
,所以默认情况下它会创建与位置无关的可执行文件(允许执行程序的ASLR以及库)。 ELF共享对象不允许使用32位绝对重定位,因为这会阻止它们在低4GiB之外加载。
64位绝对地址仍然存在允许在Linux ELF共享对象中使用文本重定位,以允许在不同的地址(ASLR和共享库)中进行加载。
code> mov rdi,qword msg 作品(NASM / YASM语法,用于10字节 mov r64,imm64
,又名AT& T语法 movabs
,唯一的指令使用64位立即)。但是这比较大,通常比 lea rdi,[rel msg]
慢,如果您决定不禁用 -pie
。您可以使用 default rel
,而不是在每个 [symbol]
寻址模式中指定它。
在位置相关代码中( -no-pie
),您应该使用 mov edi,msg
当你想要一个寄存器中的地址时; 5字节 mov r32,imm32
更小,并且运行在比RIP相对LEA更多的端口上。
使用 gcc -fno-pie -no-pie
可以覆盖旧的行为。 -no-pie
是链接器选项, -fno-pie
是代码生成选项。如果只有 -fno-pie
,gcc将会使代码像 mov eax,offset .LC0
那样没有链接仍然启用 -pie
。
( clang )可以默认启用PIE : clang -fno-pie -nopie
。 2017年7月补丁 -no-pie
-nopie
的别名,对于与gcc兼容,但clang4.0.1没有它。)
只有<$编译器生成的代码(来自C或C ++源代码)的c $ c> -no-pie (但仍然是 -fpie
)会稍微慢一些和大于所需的,但仍将链接到不受益于ASLR的依赖于位置的可执行文件。对于可执行文件的惩罚主要是针对像索引静态数组这样的东西,正如Agner在问题中所描述的那样,使用静态地址作为32位立即数或作为 [disp32 + index * 4]的一部分
寻址模式保存指令和寄存器与RIP相关的LEA以将地址存入寄存器。对于每种情况,还需要5字节 mov r32,imm32
而不是7字节 lea r64,[rel symbol]
将字符串文字或其他静态数据的地址传递给函数。
-fPIE
仍然没有符号 - 不同于 -fPIC
,对于必须通过GOT访问全局变量的共享库(这是另一个使用静态
用于任何可以限制为文件范围而不是全局范围的变量)。请参阅动态库的抱歉状态在Linux上。
因此 -fPIE
远不如差 - 针对64位代码的fPIC
,但仍然对32位不好,因为RIP相对寻址不可用。参见关于Godbolt编译器的一些例子探险家。平均而言, -fPIE
在64位代码中具有非常小的性能/代码尺寸下降。特定循环的最坏情况可能只有几个百分点。但是32位PIE可能会更糟。
这些 -f
代码生成选项都没有任何区别当只是链接时,
或者当组装 .S
手写asm。 gcc -fno-pie -no-pie -O3 main.c nasm_output.o
是您想要这两个选项的情况。
如果你的GCC是这样配置的, gcc -v |& grep -o -e'[^] * pie'
打印 - enable-default-pie
。在 2015年初发布的版本中,对gcc添加了对此配置选项的支持,一>。 Ubuntu在16.10和Debian几乎同时在gcc 6.2.0-7
(导致内核构建错误: https://lkml.org/lkml/2016/10/21/904 )。相关:将压缩的x86内核构建为PIE 。另请参阅: https:// security .stackexchange.com / questions / 41697 / why-does-linux-randomize-the-address-of-the-executable-code-segment
请注意, ld
本身并未更改其默认值。它仍然正常工作(至少在Arch Linux上使用binutils 2.28)。改变是 gcc
默认为传递 -pie
作为链接器选项,除非您明确使用 -static
或 -no-pie
。
在NASM源文件中,我用 a32 mov eax,[abs buf]
来获取绝对地址。 (我正在测试编码小绝对地址(地址大小+ mov eax,moffs: 67 a1 40 f1 60 00
)的6字节方式是否具有LCP失速英特尔CPU。它。)
nasm -felf64 -Worphan-labels -g -Fdwarf testloop.asm&&
ld -o testloop testloop.o#works
gcc -v -nostdlib testloop.o#不起作用
...
.... / collect2 ... -pie ...
/ usr / bin / ld:testloop.o:重定位在制作共享对象时,无法使用针对`.bss'的R_X86_64_32;使用-fPIC
重新编译/ usr / bin / ld:最终链接失败:输出上的非代表性部分
collect2:错误:ld返回1退出状态
gcc -v -no- pie -nostdlib testloop.o#works
gcc -v -static -nostdlib testloop.o#也可以工作:-static暗示-no-pie
相关:使用/不使用libc构建静态/动态可执行文件,定义 _start
或 main
。
检查现有的可执行文件是否为PIE
file
和 readelf
表示PIE是共享对象,而不是ELF可执行文件。静态可执行文件不能是PIE。
$ gcc -fno-pie -no-pie -O3 hello.c
$ file a.out
a.out:ELF 64位LSB可执行文件,...
$ gcc -O3 hello.c
$文件a.out
a.out:ELF 64位LSB共享对象,...
半相关(但不是真的):另一个最近的gcc功能是 它需要早期绑定而不是懒惰的动态链接,所以对于退出右侧的大型程序远。 (例如 尽管如此,这并没有消除共享库PIC代码中外部变量的GOT开销。 (见上面的godbolt链接)。 64 bit Linux uses the small memory model by default, which puts all code and static data below the 2GB address limit. This makes sure that you can use 32-bit absolute addresses. Older versions of gcc use 32-bit absolute addresses for static arrays in order to save an extra instruction for relative address calculation. However, this no longer works. If I try to make a 32-bit absolute address in assembly, I get the linker error:
"relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC".
This error message is misleading, of course, because I am not making a shared object and -fPIC doesn't help.
What I have found out so far is this: gcc version 4.8.5 uses 32-bit absolute addresses for static arrays, gcc version 6.3.0 doesn't. version 5 probably doesn't either. The linker in binutils 2.24 allows 32-bit absolute addresses, verson 2.28 does not. The consequence of this change is that old libraries have to be recompiled and legacy assembly code is broken. Now I want to ask: When was this change made? Is it documented somewhere? And is there a linker option that makes it accept 32-bit absolute addresses? Your distro configured gcc with 64-bit absolute addresses are still allowed in Linux ELF shared objects, with text relocations to allow loading at different addresses (ASLR and shared libraries). So In position-dependent code ( Use (clang can have PIE enabled by default, too: use With only Thus None of these If your GCC was configured this way, Note that In a NASM source file, I used related: building static / dynamic executables with/without libc, defining This has been asked at: https://unix.stackexchange.com/questions/89211/test-whether-linux-binary-is-compiled-as-position-independent-code Semi-related (but not really): another recent gcc feature is Distros will hopefully start enabling it soon, because it also avoids needing writeable + executable memory pages. It's a significant speedup for programs that make a lot of shared-library calls, e.g. x86-64 It does require early binding instead of lazy dynamic linking, so it's slower for big programs that exit right away. (e.g. This doesn't remove the GOT overhead for external variables in shared library PIC code, though. (See the godbolt link above). 这篇关于x86-64 Linux中不再允许32位绝对地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! gcc -fno-plt
。最后,调用共享库可以是调用[rip + symbol @ GOTPCREL]
(AT& T call * puts @ GOTPCREL(%rip) code>),没有PLT蹦床。
Distros希望能尽快启用它,因为它也避免了需要可写入的可执行内存页面。对于进行大量共享库调用的程序来说,这是一个显着的提速。 x86-64 clang -O2 -g
编译tramp3d在任何硬件上从41.6s到36.8s 补丁作者测试。 (clang可能是共享库调用的最坏情况)。
clang --version
或编译 hello.c
)。尽管如此,这种放缓可以通过预先链接来降低。
--enable-default-pie
, so it's making position-independent executables by default, (allowing for ASLR of the executable as well as libraries). 32-bit absolute relocation aren't allowed in an ELF shared object, because that would stop them from being loaded outside the low 4GiB.
mov rdi, qword msg
works (NASM/YASM syntax for 10-byte mov r64, imm64
, aka AT&T syntax movabs
, the only instruction which can use a 64-bit immediate). But that's larger and usually slower than lea rdi, [rel msg]
, which is what you should use if you decide not to disable -pie
. You can use default rel
instead of specifying it in every [symbol]
addressing mode.-no-pie
), you should use mov edi, msg
when you want an address in a register; 5-byte mov r32, imm32
is even smaller and runs on more ports than RIP-relative LEA.
gcc -fno-pie -no-pie
to override this back to the old behaviour. -no-pie
is the linker option, -fno-pie
is the code-gen option. With only -fno-pie
, gcc will make code like mov eax, offset .LC0
that doesn't link with the still-enabled -pie
.clang -fno-pie -nopie
. A July 2017 patch made -no-pie
an alias for -nopie
, for compat with gcc, but clang4.0.1 doesn't have it.)
-no-pie
, (but still -fpie
) compiler-generated code (from C or C++ sources) will be slightly slower and larger than necessary, but will still be linked into a position-dependent executable which won't benefit from ASLR. The penalty for executables is mostly for stuff like indexing static arrays, as Agner describes in the question, were using a static address as a 32-bit immediate or as part of a [disp32 + index*4]
addressing mode saves instructions and registers vs. a RIP-relative LEA to get an address into a register. Also 5-byte mov r32, imm32
instead of 7-byte lea r64, [rel symbol]
for every case like passing the address of a string literal or other static data to a function.-fPIE
still assumes no symbol-interposition for global variables / functions, unlike -fPIC
for shared libraries which have to go through the GOT to access globals (which is yet another reason to use static
for any variables that can be limited to file scope instead of global). See The sorry state of dynamic libraries on Linux.-fPIE
is much less bad than -fPIC
for 64-bit code, but still bad for 32-bit because RIP-relative addressing isn't available. See some examples on the Godbolt compiler explorer. On average, -fPIE
has a very small performance / code-size downside in 64-bit code. The worst case for a specific loop might only be a few %. But 32-bit PIE can be much worse.-f
code-gen options make any difference when just linking,
or when assembling .S
hand-written asm. gcc -fno-pie -no-pie -O3 main.c nasm_output.o
is a case where you want both options.
gcc -v |& grep -o -e '[^ ]*pie'
prints --enable-default-pie
. Support for this config option was added to gcc in early 2015. Ubuntu enabled it in 16.10, and Debian around the same time in gcc 6.2.0-7
(leading to kernel build errors: https://lkml.org/lkml/2016/10/21/904). Related: Build compressed x86 kernels as PIE. See also: https://security.stackexchange.com/questions/41697/why-doesnt-linux-randomize-the-address-of-the-executable-code-segment
ld
itself didn't change its default. It still works normally (at least on Arch Linux with binutils 2.28). The change is that gcc
defaults to passing -pie
as a linker option, unless you explicitly use -static
or -no-pie
.a32 mov eax, [abs buf]
to get an absolute address. (I was testing if the 6-byte way to encode small absolute addresses (address-size + mov eax,moffs: 67 a1 40 f1 60 00
) has an LCP stall on Intel CPUs. It does.)nasm -felf64 -Worphan-labels -g -Fdwarf testloop.asm &&
ld -o testloop testloop.o # works
gcc -v -nostdlib testloop.o # doesn't work
...
..../collect2 ... -pie ...
/usr/bin/ld: testloop.o: relocation R_X86_64_32 against `.bss' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
gcc -v -no-pie -nostdlib testloop.o # works
gcc -v -static -nostdlib testloop.o # also works: -static implies -no-pie
_start
or main
.
Checking if an existing executable is PIE or not
file
and readelf
say that PIEs are "shared objects", not ELF executables. Static executables can't be PIE.$ gcc -fno-pie -no-pie -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB executable, ...
$ gcc -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB shared object, ...
gcc -fno-plt
. Finally calls into shared libraries can be just call [rip + symbol@GOTPCREL]
(AT&T call *puts@GOTPCREL(%rip)
), with no PLT trampoline.clang -O2 -g
compiling tramp3d goes from 41.6s to 36.8s on whatever hardware the patch author tested on. (clang is maybe a worst-case scenario for share library calls.)clang --version
or compiling hello.c
). This slowdown could be reduced with prelink, apparently.