对于小程序,链接后的最小可执行文件大小现在比 2 年前大 10 倍? [英] Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs?
问题描述
对于大学课程,我喜欢比较使用 gcc/clang 与汇编编写和编译的功能相似程序的代码大小.在重新评估如何进一步缩小某些可执行文件的大小的过程中,当我 2 年前组装/链接的完全相同的汇编代码在重新构建后现在已经增长了 10 倍时,我简直不敢相信自己的眼睛(这适用于多个程序,不仅是 helloworld):
For a university course, I like to compare code-sizes of functionally similar programs if written and compiled using gcc/clang versus assembly. In the process of re-evaluating how to further shrink the size of some executables, I couldn't trust my eyes when the very same assembly code I assembled/linked 2 years ago now has grown >10x in size after building it again (which true for multiple programs, not only helloworld):
$ make
as -32 -o helloworld-asm-2020.o helloworld-asm-2020.s
ld -melf_i386 -o helloworld-asm-2020 helloworld-asm-2020.o
$ ls -l
-rwxr-xr-x 1 xxx users 708 Jul 18 2018 helloworld-asm-2018*
-rwxr-xr-x 1 xxx users 8704 Nov 25 15:00 helloworld-asm-2020*
-rwxr-xr-x 1 xxx users 4724 Nov 25 15:00 helloworld-asm-2020-n*
-rwxr-xr-x 1 xxx users 4228 Nov 25 15:00 helloworld-asm-2020-n-sstripped*
-rwxr-xr-x 1 xxx users 604 Nov 25 15:00 helloworld-asm-2020.o*
-rw-r--r-- 1 xxx users 498 Nov 25 14:44 helloworld-asm-2020.s
汇编代码为:
.code32
.section .data
msg: .ascii "Hello, world!
"
len = . - msg
.section .text
.globl _start
_start:
movl $len, %edx # EDX = message length
movl $msg, %ecx # ECX = address of message
movl $1, %ebx # EBX = file descriptor (1 = stdout)
movl $4, %eax # EAX = syscall number (4 = write)
int $0x80 # call kernel by interrupt
# and exit
movl $0, %ebx # return code is zero
movl $1, %eax # exit syscall number (1 = exit)
int $0x80 # call kernel again
使用 GNU as
和 GNU ld
(始终使用 32 位汇编)编译的同一个 hello world 程序当时是 708 字节,现在已经增长到 8.5K.即使告诉链接器关闭页面对齐(ld -n
),它仍然有将近 4.2K.strip
ping/sstrip
ping 也没有回报.
The same hello world program, compiled using GNU as
and GNU ld
(always using 32-bit assembly) was 708 bytes then, and has grown to 8.5K now. Even when telling the linker to turn off page alignment (ld -n
), it still has almost 4.2K. strip
ping/sstrip
ping doesn't pay off either.
readelf
告诉我代码中段标题的开始要晚得多(字节 468 与 8464),但我不知道为什么.它运行在与 2018 年相同的架构系统上,Makefile 是相同的,我没有链接任何库(尤其是 libc).由于目标文件仍然很小,我猜关于 ld
的某些内容已经更改,但是是什么以及为什么?
readelf
tells me that the start of section headers is much later in the code (byte 468 vs 8464), but I have no idea why. It's running on the same arch system as in 2018, the Makefile is the same and I'm not linking against any libraries (especially not libc). I guess something regarding ld
has changed due to the fact that the object file is still quite small, but what and why?
免责声明:我正在 x86-64 机器上构建 32 位可执行文件.
Disclaimer: I'm building 32-bit executables on an x86-64 machine.
我使用的是 GNU binutils(作为 & ld)版本 2.35.1 这是一个 base64 编码的存档,其中包括源代码和两个可执行文件(小的旧的,大的新的):
I'm using GNU binutils (as & ld) version 2.35.1 Here is a base64-encoded archive which includes the source and both executables (small old one, large new one) :
cat << EOF | base64 -d | tar xj
QlpoOTFBWSZTWVaGrEQABBp////xebj/7//Xf+a8RP/v3/rAAEVARARAeEADBAAAoCAI0AQ+NAam
ytMpCGmpDVPU0aNpGmh6Rpo9QAAeoBoADQaNAADQ09IAACSSGUwaJpTNQGE9QZGhoADQPUAA0AAA
AA0aA4AAAABoAAAAA0GgAAAAZAGgAHAAAAANAAAAAGg0AAAADIA0AASJCBIyE8hHpqPVPUPU/VAa
fqn6o0ep6BB6TQaNGj0j1ABobU00yeU9JYiuVVZKYE+dKNa3wls6x81yBpGAN71NoylDUvNryWiW
E4ER8XkfpaJcPb6ND12ULEqkQX3eaBHP70Apa5uFhWNDy+U3Ekj+OLx5MtDHxQHQLfMcgCHrGayE
Dc76F4ZC4rcRkvTW4S2EbJAsbBGbQxSbx5o48zkyk5iPBBhJowtCSwDBsQBc0koYRSO6SgJNL0Bg
EmCoxCDAs5QkEmTGmQUgqZNIoxsmwDmDQe0NIDI0KjQ64leOr1fVk6AaVhjOAJjLrEYkYy4cDbyS
iXSuILWohNh+PA9Izk0YUM4TQQGEYNgn4oEjGmAByO+kzmDIxEC3Txni6E1WdswBJLKYiANdiQ2K
00jU/zpMzuIhjTbgiBqE24dZWBcNBBAAioiEhCQEIfAR8Vir4zNQZFgvKZa67Jckh6EHZWAWuf6Q
kGy1lOtA2h9fsyD/uPPI2kjvoYL+w54IUKBEEYFBIWRNCNpuyY86v3pNiHEB7XyCX5wDjZUSF2tO
w0PVlY2FQNcLQcbZjmMhZdlCGkVHojuICHMMMB5kQQSZRwNJkYTKz6stT/MTWmozDCcj+UjtB9Cf
CUqAqqRlgJdREtMtSO4S4GpJE2I/P8vuO9ckqCM2+iSJCLRWx2Gi8VSR8BIkVX6stqIDmtG8xSVU
kk7BnC5caZXTIynyI0doXiFY1+/Csw2RUQJroC0lCNiIqVVUkTqTRMYqKNVGtCJ5yfo7e3ZpgECk
PYUEihPU0QVgfQ76JA8Eb16KCbSzP3WYiVApqmfDhUk0aVc+jyBJH13uKztUuva8F4YdbpmzomjG
kSJmP+vCFdKkHU384LdRoO0LdN7VJlywJ2xJdM+TMQ0KhMaicvRqfC5pHSu+gVDVjfiss+S00ikI
DeMgatVKKtcjsVDX09XU3SzowLWXXunnFZp/fP3eN9Rj1ubiLc0utMl3CUUkcYsmwbKKrWhaZiLO
u67kMSsW20jVBcZ5tZUKgdRtu0UleWOs1HK2QdMpyKMxTRHWhhHwMnVEsWIUEjIfFEbWhRTRMJXn
oIBSEa2Q0llTBfJV0LEYEQTBTFsDKIxhgqNwZB2dovl/kiW4TLp6aGXxmoIpVeWTEXqg1PnyKwux
caORGyBhTEPV2G7/O3y+KeAL9mUM4Zjl1DsDKyTZy8vgn31EDY08rY+64Z/LO5tcRJHttMYsz0Fh
CRN8LTYJL/I/4u5IpwoSCtDViIA=
EOF
更新:当使用 ld.gold
而不是 ld.bfd
(/usr/bin/ld
默认符号链接到)时,可执行文件的大小变得和预期的一样小:
Update:
When using ld.gold
instead of ld.bfd
(to which /usr/bin/ld
is symlinked to by default), the executable size becomes as small as expected:
$ cat Makefile
TARGET=helloworld
all:
as -32 -o ${TARGET}-asm.o ${TARGET}-asm.s
ld.bfd -melf_i386 -o ${TARGET}-asm-bfd ${TARGET}-asm.o
ld.gold -melf_i386 -o ${TARGET}-asm-gold ${TARGET}-asm.o
rm ${TARGET}-asm.o
$ make -q
$ ls -l
total 68
-rw-r--r-- 1 eso eso 200 Dec 1 13:57 Makefile
-rwxrwxr-x 1 eso eso 8700 Dec 1 13:57 helloworld-asm-bfd
-rwxrwxr-x 1 eso eso 732 Dec 1 13:57 helloworld-asm-gold
-rw-r--r-- 1 eso eso 498 Dec 1 13:44 helloworld-asm.s
也许我之前只是在不知道的情况下使用了 gold
.
Maybe I just used gold
previously without being aware.
推荐答案
通常不是 10 倍,它是 Jester 所说的几个部分的页面对齐,每次更改 ld
的默认链接器出于安全原因的脚本:
It's not 10x in general, it's page-alignment of a couple sections as Jester says, per changes to ld
's default linker script for security reasons:
第一个更改:确保
.data
中的数据不存在于.text
的任何映射中,因此没有任何静态数据可用用于可执行页面中的 ROP/Spectre 小工具.(在较旧的ld
中,这意味着程序头将同一个磁盘块映射两次,也映射到实际 .data 部分的 RW-without-exec 段中.可执行映射仍然是只读的.)
First change: Making sure data from
.data
isn't present in any of the mapping of.text
, so none of that static data is available for ROP / Spectre gadgets in an executable page. (In olderld
, that meant the program-headers mapped the same disk-block twice, also into a RW-without-exec segment for the actual .data section. The executable mapping was still read-only.)
最近的变化:将 .rodata
与 .text
分隔成单独的段,同样,静态数据不会映射到可执行页面.以前,const char code[]= {...}
可以转换为函数指针并调用,不需要 mprotect 或 gcc -z execstack
或其他技巧,如果你想测试shellcode 那样.(单独的 Linux 内核更改使 -z execstack
仅适用于实际堆栈,而不适用于 READ_IMPLIES_EXEC.)
More recent change: Separate .rodata
from .text
into separate segments, again so static data isn't mapped into an executable page. Previously, const char code[]= {...}
could be cast to a function pointer and called, without needing mprotect or gcc -z execstack
or other tricks, if you wanted to test shellcode that way. (A separate Linux kernel change made -z execstack
only apply to the actual stack, not READ_IMPLIES_EXEC.)
参见 为什么是 ELF可执行文件可能有 4 个 LOAD 段? 对于这段历史,包括一个奇怪的事实,即 .rodata
与用于访问 ELF 元数据的只读映射位于一个单独的段中.
See Why an ELF executable could have 4 LOAD segments? for this history, including the strange fact that .rodata
is in a separate segment from the read-only mapping for access to the ELF metadata.
额外的空间只是 00
填充,可以很好地压缩在 .tar.gz
或其他内容中.
That extra space is just 00
padding and will compress well in a .tar.gz
or whatever.
因此它的最坏情况上限约为 2x 4k 额外页面的填充,而微小的可执行文件接近最坏情况.
gcc -Wl,--nmagic
如果出于某种原因需要,将关闭部分的页面对齐.(请参阅 ld(1)
手册页) 我不知道为什么这不能把所有东西都压缩到旧尺寸.也许检查默认链接器脚本会有所启发,但它很长.运行 ld --verbose
来查看它.
gcc -Wl,--nmagic
will turn off page-alignment of sections if you want that for some reason. (see the ld(1)
man page) I don't know why that doesn't pack everything down to the old size. Perhaps checking the default linker script would shed some light, but it's pretty long. Run ld --verbose
to see it.
strip
ping 对作为部分的一部分的填充没有帮助;我认为它只能删除整个部分.
strip
ping won't help for padding that's part of a section; I think it can only remove whole sections.
ld -znoseparate-code
使用旧的布局,总共只有 2 段来覆盖 .text
和 .rodata
部分,以及 .data
和 .bss
部分.(以及动态链接想要访问的 ELF 元数据.)
ld -z noseparate-code
uses the old layout, only 2 total segments to cover the .text
and .rodata
sections, and the .data
and .bss
sections. (And the ELF metadata that dynamic linking wants access to.)
这个问题是关于 ld
的,但请注意,如果您使用的是 gcc -nostdlib
,那过去也默认生成静态可执行文件.但是现代 Linux 发行版使用 -pie
作为默认配置 GCC,并且 GCC 默认情况下不会制作静态饼图,即使没有链接任何共享库.与 -no-pie
模式不同,在这种情况下它只会生成一个静态可执行文件.(static-pie 仍然需要启动代码来为任何绝对地址应用重定位.)
This question is about ld
, but note that if you're using gcc -nostdlib
, that used to also default to making a static executable. But modern Linux distros config GCC with -pie
as the default, and GCC won't make a static-pie by default even if there aren't any shared libraries being linked. Unlike with -no-pie
mode where it will simply make a static executable in that case. (A static-pie still needs startup code to apply relocations for any absolute addresses.)
所以 ld
的直接等价物是 gcc -nostdlib -static
(这意味着 -no-pie
).或者 gcc -nostdlib -no-pie
应该让它默认为 -static
当没有链接的共享库时.您可以将其与 -Wl,--nmagic
和/或 -Wl,-z -Wl,noseparate-code
结合使用.
So the equivalent of ld
directly is gcc -nostdlib -static
(which implies -no-pie
). Or gcc -nostdlib -no-pie
should let it default to -static
when there are no shared libs being linked. You can combine this with -Wl,--nmagic
and/or -Wl,-z -Wl,noseparate-code
.
还有:
关于为 Linux 创建真正小巧的 ELF 可执行文件的旋风教程 - 最终生成一个 45 字节的可执行文件,并将
_exit
系统调用的机器代码塞入 ELF 程序头本身.
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux - eventually making a 45 byte executable, with the machine code for an
_exit
syscall stuffed into the ELF program header itself.
FASM 可以制作非常小的可执行文件,使用它的模式直接输出静态可执行文件(不是目标文件),没有 ELF 部分元数据,只有程序头.(用 GDB 调试或用 objdump 反汇编很痛苦;大多数工具都假设会有节头,即使它们不需要运行静态可执行文件.)
FASM can make quite small executables, using its mode where it outputs a static executable (not object file) directly with no ELF section metadata, just program headers. (It's a pain to debug with GDB or disassemble with objdump; most tools assume there will be section headers, even though they're not needed to run static executables.)
对于包括设置在内的小型 C 程序,合理的最少汇编指令数是多少?
两者有什么区别静态链接"和不是动态可执行文件"来自 Linux ldd?(静态与静态派与(动态)PIE 恰好没有共享库.)
What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd? (static vs. static-pie vs. (dynamic) PIE that happens to have no shared libraries.)
这篇关于对于小程序,链接后的最小可执行文件大小现在比 2 年前大 10 倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!