在 64 位系统上组装 32 位二进制文​​件(GNU 工具链) [英] Assembling 32-bit binaries on a 64-bit system (GNU toolchain)

查看:27
本文介绍了在 64 位系统上组装 32 位二进制文​​件(GNU 工具链)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了成功编译的汇编代码:

I wrote assembly code that successfully compiles:

as power.s -o power.o

但是,当我尝试链接目标文件时它失败了:

However, it fails when I try to link the object file:

ld power.o -o power

为了在 64 位操作系统 (Ubuntu 14.04) 上运行,我在 power.s 文件的开头添加了 .code32,但是我仍然得到错误:

In order to run on the 64 bit OS (Ubuntu 14.04), I added .code32 at the beginning of the power.s file, however I still get the error:

分段错误(核心转储)

power.s:

.code32
.section .data
.section .text
.global _start
_start:
pushl $3
pushl $2 
call power 
addl $8, %esp
pushl %eax 

pushl $2
pushl $5
call power
addl $8, %esp

popl %ebx
addl %eax, %ebx

movl $1, %eax
int $0x80



.type power, @function
power:
pushl %ebp  
movl %esp, %ebp 
subl $4, %esp 
movl 8(%ebp), %ebx 
movl 12(%ebp), %ecx 
movl %ebx, -4(%ebp) 

power_loop_start:
cmpl $1, %ecx 
je end_power
movl -4(%ebp), %eax
imull %ebx, %eax
movl %eax, -4(%ebp)

decl %ecx
jmp power_loop_start

end_power:
movl -4(%ebp), %eax 
movl %ebp, %esp
popl %ebp
ret

推荐答案

TL:DR: use gcc -m32 -static -nostdlib foo.S(或等效于和 ld 选项).
或者,如果您没有定义自己的 _start,只需 gcc -m32 -no-pie foo.S

如果您链接 libc,您可能需要安装 gcc-multilib,或者您的发行版软件包 /usr/lib32/libc.so, /usr/lib32/libstdc++.so 等等.但是,如果您定义自己的_start 并且不链接库,则不需要库包,只需要一个支持 32 位进程和系统调用的内核.这包括大多数发行版,但不包括适用于 Linux v1 的 Windows 子系统.

You may need to install gcc-multilib if you link libc, or however your distro packages /usr/lib32/libc.so, /usr/lib32/libstdc++.so and so on. But if you define your own _start and don't link libraries, you don't need the library package, just a kernel that supports 32-bit processes and system calls. This includes most distros, but not Windows Subsystem for Linux v1.

.code32不会改变输出文件格式,这决定了你的程序运行的模式.不尝试运行 32 位代码取决于你在 64 位模式下..code32 用于组装具有一些 16 位和一些 32 位代码之类的内核.如果这不是你正在做的事情,避免它,这样当你在错误的模式下构建 .S 时你会得到构建时错误,如果它有任何 pushpop 指令,例如..code32 只是让您创建令人困惑的调试运行时问题,而不是构建时错误.

.code32 does not change the output file format, and that's what determines the mode your program will run in. It's up to you to not try to run 32bit code in 64bit mode. .code32 is for assembling kernels that have some 16 and some 32-bit code, and stuff like that. If that's not what you're doing, avoid it so you'll get build-time errors when you build a .S in the wrong mode if it has any push or pop instructions, for example. .code32 just lets you create confusing-to-debug runtime problems instead of build-time errors.

建议:手写汇编器使用.S 扩展名.(gcc -c foo.S 会在 as 之前通过 C 预处理器运行它,所以你可以 #include > 例如,对于系统调用号).此外,它将它与 .s 编译器输出(来自 gcc foo.c -O3 -S)区分开来.

Suggestion: use the .S extension for hand-written assembler. (gcc -c foo.S will run it through the C preprocessor before as, so you can #include <sys/syscall.h> for syscall numbers, for example). Also, it distinguishes it from .s compiler output (from gcc foo.c -O3 -S).

gcc -g foo.S -o foo -m32 -nostdlib -static  # static binary with absolutely no libraries or startup code
                       # -nostdlib still dynamically links when Linux where PIE is the default, or on OS X

gcc -g foo.S -o foo -m32 -no-pie            # dynamic binary including the startup boilerplate code.
     # Use with code that defines a main(), not a _start

nostdlib-nostartfiles 的文档-static.

某些函数,例如 malloc(3),或包括 printf(3) 在内的 stdio 函数,依赖于一些正在初始化的全局数据(例如 FILE *stdout 和它实际指向的对象).

Some functions, like malloc(3), or stdio functions including printf(3), depend on some global data being initialized (e.g. FILE *stdout and the object it actually points to).

gcc -nostartfiles 省略了 CRT _start 样板代码,但仍然链接 libc(默认情况下是动态的).在 Linux 上,共享库可以具有由动态链接器在加载它们时运行的初始化部分,然后跳转到您的 _start 入口点.所以 gcc -nostartfiles hello.S 仍然允许你调用 printf.对于动态可执行文件,内核在其上运行 /lib/ld-linux.so.2 而不是直接运行它(使用 readelf -a 查看ELF"解释器"字符串在您的二进制文件中).当您的 _start 最终运行时,并非所有寄存器都将清零,因为动态链接器在您的进程中运行代码.

gcc -nostartfiles leaves out the CRT _start boilerplate code, but still links libc (dynamically, by default). On Linux, shared libraries can have initializer sections that are run by the dynamic linker when it loads them, before jumping to your _start entry point. So gcc -nostartfiles hello.S still lets you call printf. For a dynamic executable, the kernel runs /lib/ld-linux.so.2 on it instead of running it directly (use readelf -a to see the "ELF interpreter" string in your binary). When your _start eventually runs, not all the registers will be zeroed, because the dynamic linker ran code in your process.

然而,gcc -nostartfiles -static hello.S 将链接,但在运行时崩溃 如果你调用 printf 或其他东西而不调用 glibc 的内部初始化函数.(见迈克尔佩奇的评论).

However, gcc -nostartfiles -static hello.S will link, but crash at runtime if you call printf or something without calling glibc's internal init functions. (see Michael Petch's comment).

当然你可以把.c.S.o文件的任意组合放在同一个命令行上来链接它们全部变成一个可执行文件.如果您有任何 C 语言,请不要忘记 -Og -Wall -Wextra:当问题是 C 语言中的一些简单的问题时,您不想调试 asm,编译器可以调用它已经警告过你了.

Of course you can put any combination of .c, .S, and .o files on the same command line to link them all into one executable. If you have any C, don't forget -Og -Wall -Wextra: you don't want to be debugging your asm when the problem was something simple in the C that calls it that the compiler could have warned you about.

使用 -v 来让 gcc 向你展示它运行以组装和链接的命令.手动"操作:

Use -v to have gcc show you the commands it runs to assemble and link. To do it "manually":

as foo.S -o foo.o -g --32 &&      # skips the preprocessor
ld -o foo foo.o  -m elf_i386

file foo
foo: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped

gcc -nostdlib -m32 比 as 和 ld 的两个不同选项(--32-m elf_i386)更容易记住和输入代码>).此外,它适用于所有平台,包括可执行格式不是 ELF 的平台.(但是 Linux 示例在 OS X 上不起作用,因为系统调用号不同,或者在 Windows 上不起作用,因为它甚至不使用 int 0x80 ABI.)

gcc -nostdlib -m32 is easier to remember and type than the two different options for as and ld (--32 and -m elf_i386). Also, it works on all platforms, including ones where executable format isn't ELF. (But Linux examples won't work on OS X, because the system call numbers are different, or on Windows because it doesn't even use the int 0x80 ABI.)

gcc 无法处理 NASM 语法.(-masm=intel 更像是 MASM 而不是 NASM 语法,在那里你需要 offset symbol 来获取地址作为立即数).当然,指令是不同的(例如 .globlglobal).

gcc can't handle NASM syntax. (-masm=intel is more like MASM than NASM syntax, where you need offset symbol to get the address as an immediate). And of course the directives are different (e.g. .globl vs global).

您可以使用 nasmyasm,然后将 .ogcc 链接如上, 或 ld 直接.

You can build with nasm or yasm, then link the .o with gcc as above, or ld directly.

我使用包装脚本来避免重复键入具有三个不同扩展名的相同文件名.(nasm 和 yasm 默认为 file.asm -> file.o,与 GNU as 的 a.out 的默认输出不同).与 -m32 一起使用它来组装和链接 32 位 ELF 可执行文件.并非所有操作系统都使用 ELF,因此该脚本的可移植性不如使用 gcc -nostdlib -m32 链接...

I use a wrapper script to avoid the repetitive typing of the same filename with three different extensions. (nasm and yasm default to file.asm -> file.o, unlike GNU as's default output of a.out). Use this with -m32 to assemble and link 32bit ELF executables. Not all OSes use ELF, so this script is less portable than using gcc -nostdlib -m32 to link would be..

#!/bin/bash
# usage: asm-link [-q] [-m32] foo.asm  [assembler options ...]
# Just use a Makefile for anything non-trivial.  This script is intentionally minimal and doesn't handle multiple source files
# Copyright 2020 Peter Cordes.  Public domain.  If it breaks, you get to keep both pieces

verbose=1                       # defaults
fmt=-felf64
#ldopt=-melf_i386
ldlib=()

linker=ld
#dld=/lib64/ld-linux-x86-64.so.2
while getopts 'Gdsphl:m:nvqzN' opt; do
    case "$opt" in
        m)  if [ "m$OPTARG" = "m32" ]; then
                fmt=-felf32
                ldopt=-melf_i386
                #dld=/lib/ld-linux.so.2  # FIXME: handle linker=gcc non-static executable
            fi
            if [ "m$OPTARG" = "mx32" ]; then
                fmt=-felfx32
                ldopt=-melf32_x86_64
            fi
            ;;
        #   -static
        l)  linker="gcc -no-pie -fno-plt -nostartfiles"; ldlib+=("-l$OPTARG");;
        p)  linker="gcc -pie -fno-plt -nostartfiles"; ldlib+=("-pie");;
        h)  ldlib+=("-Ttext=0x200800000");;   # symbol addresses outside the low 32.  data and bss go in range of text
                          # strace -e raw=write  will show the numeric address
        G)  nodebug=1;;      # .label: doesn't break up objdump output
        d)  disas=1;;
        s)  runsize=1;;
        n)  use_nasm=1 ;;
        q)  verbose=0 ;;
        v)  verbose=1 ;;
        z)  ldlib+=("-zexecstack") ;;
        N)  ldlib+=("-N") ;;   # --omagic = read+write text section
    esac
done
shift "$((OPTIND-1))"   # Shift off the options and optional --

src=$1
base=${src%.*}
shift

#if [[ ${#ldlib[@]} -gt 0 ]]; then
    #    ldlib+=("--dynamic-linker" "$dld")
    #ldlib=("-static" "${ldlib[@]}")
#fi

set -e
if (($use_nasm)); then
  #  (($nodebug)) || dbg="-g -Fdwarf"     # breaks objdump disassembly, and .labels are included anyway
    ( (($verbose)) && set -x    # print commands as they're run, like make
    nasm "$fmt" -Worphan-labels $dbg  "$src" "$@" &&
        $linker $ldopt -o "$base" "$base.o"  "${ldlib[@]}")
else
    (($nodebug)) || dbg="-gdwarf2"
    ( (($verbose)) && set -x    # print commands as they're run, like make
    yasm "$fmt" -Worphan-labels $dbg "$src" "$@" &&
        $linker $ldopt -o "$base" "$base.o"  "${ldlib[@]}" )
fi

# yasm -gdwarf2 includes even .local labels so they show up in objdump output
# nasm defaults to that behaviour of including even .local labels

# nasm defaults to STABS debugging format, but -g is not the default

if (($disas));then
    objdump -drwC -Mintel "$base"
fi

if (($runsize));then
    size $base
fi

我更喜欢 YASM 有几个原因,包括它默认生成长 nop s 而不是填充许多单字节 nop s.这会导致反汇编输出混乱,并且如果 nops 运行会变慢.(在 NASM 中,您必须使用 smartalign 宏包.)

I prefer YASM for a few reasons, including that it defaults to making long-nops instead of padding with many single-byte nops. That makes for messy disassembly output, as well as being slower if the nops ever run. (In NASM, you have to use the smartalign macro package.)

然而,YASM 已经有一段时间没有维护了,只有 NASM 支持 AVX512;这些天我更经常使用 NASM.

However, YASM hasn't been maintained for a while and only NASM has AVX512 support; these days I more often just use NASM.

# hello32.S

#include <asm/unistd_32.h>   // syscall numbers.  only #defines, no C declarations left after CPP to cause asm syntax errors

.text
#.global main   # uncomment these to let this code work as _start, or as main called by glibc _start
#main:
#.weak _start

.global _start
_start:
        mov     $__NR_gettimeofday, %eax  # make a syscall that we can see in strace output so we know when we get here
        int     $0x80

        push    %esp
        push    $print_fmt
        call   printf

        #xor    %ebx,%ebx                 # _exit(0)
        #mov    $__NR_exit_group, %eax    # same as glibc's _exit(2) wrapper
        #int    $0x80                     # won't flush the stdio buffer

        movl    $0, (%esp)   # reuse the stack slots we set up for printf, instead of popping
        call    exit         # exit(3) does an fflush and other cleanup

        #add    $8, %esp     # pop the space reserved by the two pushes
        #ret                 # only works in main, not _start

.section .rodata
print_fmt: .asciz "Hello, World!
%%esp at startup = %#lx
"


$ gcc -m32 -nostdlib hello32.S
/tmp/ccHNGx24.o: In function `_start':
(.text+0x7): undefined reference to `printf'
...
$ gcc -m32 hello32.S
/tmp/ccQ4SOR8.o: In function `_start':
(.text+0x0): multiple definition of `_start'
...


在运行时失败,因为没有调用 glibc init 函数.(根据 Michael Petch 的评论,__libc_init_first__dl_tls_setup__libc_csu_init 按此顺序.存在其他 libc 实现,包括 MUSL,它专为静态链接而设计,无需初始化调用.)


Fails at run-time, because nothing calls the glibc init functions. (__libc_init_first, __dl_tls_setup, and __libc_csu_init in that order, according to Michael Petch's comment. Other libc implementations exist, including MUSL which is designed for static linking and works without initialization calls.)

$ gcc -m32 -nostartfiles -static hello32.S     # fails at run-time
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, BuildID[sha1]=ef4b74b1c29618d89ad60dbc6f9517d7cdec3236, not stripped
$ strace -s128 ./a.out
execve("./a.out", ["./a.out"], [/* 70 vars */]) = 0
[ Process PID=29681 runs in 32 bit mode. ]
gettimeofday(NULL, NULL)                = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

你也可以gdb ./a.out,然后运行b _startlayout regrun,看看会发生什么.

You could also gdb ./a.out, and run b _start, layout reg, run, and see what happens.

$ gcc -m32 -nostartfiles hello32.S             # Correct command line
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=7b0a731f9b24a77bee41c13ec562ba2a459d91c7, not stripped

$ ./a.out
Hello, World!
%esp at startup = 0xffdf7460

$ ltrace -s128 ./a.out > /dev/null
printf("Hello, World!
%%esp at startup = %#lx
", 0xff937510)      = 43    # note the different address: Address-space layout randomization at work
exit(0 <no return ...>
+++ exited (status 0) +++

$ strace -s128 ./a.out > /dev/null        # redirect stdout so we don't see a mix of normal output and trace output
execve("./a.out", ["./a.out"], [/* 70 vars */]) = 0
[ Process PID=29729 runs in 32 bit mode. ]
brk(0)                                  = 0x834e000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
....   more syscalls from dynamic linker code
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
mmap2(NULL, 1814236, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xfffffffff7556000    # map the executable text section of the library
... more stuff
# end of dynamic linker's code, finally jumps to our _start

gettimeofday({1461874556, 431117}, NULL) = 0
fstat64(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0  # stdio is figuring out whether stdout is a terminal or not
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0xff938870) = -1 ENOTTY (Inappropriate ioctl for device)
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7743000      # 4k buffer for stdout
write(1, "Hello, World!
%esp at startup = 0xff938fb0
", 43) = 43
exit_group(0)                           = ?
+++ exited with 0 +++

如果我们使用了 _exit(0),或者使用 int 0x80 使 sys_exit 系统调用自己,write(2) 不会发生.将 stdout 重定向到非 tty,它默认为全缓冲(不是行缓冲),因此 write(2) 仅由 fflush(3) 作为 exit(3) 的一部分.如果没有重定向,使用包含换行符的字符串调用 printf(3) 将立即刷新.

If we'd used _exit(0), or made the sys_exit system call ourselves with int 0x80, the write(2) wouldn't have happened. With stdout redirected to a non-tty, it defaults to full-buffered (not line-buffered), so the write(2) is only triggered by the fflush(3) as part of exit(3). Without redirection, calling printf(3) with a string containing newlines will flush immediately.

根据 stdout 是否为终端而表现出不同的行为可能是可取的,但前提是您有意这样做,而不是错误地这样做.

Behaving differently depending on whether stdout is a terminal can be desirable, but only if you do it on purpose, not by mistake.

这篇关于在 64 位系统上组装 32 位二进制文​​件(GNU 工具链)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆