您好,Linux系统调用的汇编语言世界吗? [英] Hello, world in assembly language with Linux system calls?

查看:114
本文介绍了您好,Linux系统调用的汇编语言世界吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  1. 我知道int 0x80在Linux中正在中断.但是,我不明白这段代码是如何工作的.它返回什么了吗?

  2. $ - msg代表什么?

global _start

section .data
    msg db "Hello, world!", 0x0a
    len equ $ - msg

section .text
_start:
    mov eax, 4
    mov ebx, 1
    mov ecx, msg
    mov edx, len
    int 0x80 ;What is this?
    mov eax, 1
    mov ebx, 0
    int 0x80 ;and what is this?

解决方案

$在NASM中到底如何工作?解释了如何$ - msg使NASM为您计算字符串长度作为汇编时间常数,而不是对其进行硬编码.


我最初为 SO文档(主题ID:1164,示例ID:19078),重写了@runner的一个基本评论较少的示例.什么是UNIX&的调用约定在i386和x86-64上的Linux系统调用内核会根据int 0x80时寄存器中的值进行填充执行,然后最终返回.返回值以EAX表示.

当执行到达内核的入口点时,它将查看EAX并根据EAX中的调用号将其分派到正确的系统调用.来自其他寄存器的值作为函数args传递给该系统调用的内核处理程序. (例如eax = 4/int 0x80将使内核调用其sys_write内核函数,从而实现POSIX write系统调用.)

另请参阅很少是完美的.

libc为系统调用提供了包装函数,因此编译器生成的代码将call write而不是直接使用int 0x80调用(或者,如果您关心性能,请使用sysenter). (在x86-64代码中, syscalls(2) .

系统调用记录在第2节手册中,例如 .有关libc包装函数和基础Linux系统调用之间的区别,请参见注意"部分.请注意,sys_exit的包装是 _exit(2) ,而不是刷新stdio的 exit(3) ISO C函数缓冲区和其他清理工作.还有一个exit_group系统调用,该调用 sys_write(1, "Hello, World!\n", sizeof(...));

  • sys_exit(0);
  • 我对此进行了沉重的评论(以至于它开始掩盖了实际代码而没有突出显示颜色语法).这是一种向初学者提出建议的方法,而不是通常应如何注释代码的方法.

     section .text             ; Executable code goes in the .text section
    global _start             ; The linker looks for this symbol to set the process entry point, so execution start here
    ;;;a name followed by a colon defines a symbol.  The global _start directive modifies it so it's a global symbol, not just one that we can CALL or JMP to from inside the asm.
    ;;; note that _start isn't really a "function".  You can't return from it, and the kernel passes argc, argv, and env differently than main() would expect.
     _start:
        ;;; write(1, msg, len);
        ; Start by moving the arguments into registers, where the kernel will look for them
        mov     edx,len       ; 3rd arg goes in edx: buffer length
        mov     ecx,msg       ; 2nd arg goes in ecx: pointer to the buffer
        ;Set output to stdout (goes to your terminal, or wherever you redirect or pipe)
        mov     ebx,1         ; 1st arg goes in ebx: Unix file descriptor. 1 = stdout, which is normally connected to the terminal.
    
        mov     eax,4         ; system call number (from SYS_write / __NR_write from unistd_32.h).
        int     0x80          ; generate an interrupt, activating the kernel's system-call handling code.  64-bit code uses a different instruction, different registers, and different call numbers.
        ;; eax = return value, all other registers unchanged.
    
        ;;;Second, exit the process.  There's nothing to return to, so we can't use a ret instruction (like we could if this was main() or any function with a caller)
        ;;; If we don't exit, execution continues into whatever bytes are next in the memory page,
        ;;; typically leading to a segmentation fault because the padding 00 00 decodes to  add [eax],al.
    
        ;;; _exit(0);
        xor     ebx,ebx       ; first arg = exit status = 0.  (will be truncated to 8 bits).  Zeroing registers is a special case on x86, and mov ebx,0 would be less efficient.
                          ;; leaving out the zeroing of ebx would mean we exit(1), i.e. with an error status, since ebx still holds 1 from earlier.
        mov     eax,1         ; put __NR_exit into eax
        int     0x80          ;Execute the Linux function
    
    section     .rodata       ; Section for read-only constants
    
                 ;; msg is a label, and in this context doesn't need to be msg:.  It could be on a separate line.
                 ;; db = Data Bytes: assemble some literal bytes into the output file.
    msg     db  'Hello, world!',0xa     ; ASCII string constant plus a newline (0x10)
    
                 ;;  No terminating zero byte is needed, because we're using write(), which takes a buffer + length instead of an implicit-length string.
                 ;; To make this a C string that we could pass to puts or strlen, we'd need a terminating 0 byte. (e.g. "...", 0x10, 0)
    
    len     equ $ - msg       ; Define an assemble-time constant (not stored by itself in the output file, but will appear as an immediate operand in insns that use it)
                              ; Calculate len = string length.  subtract the address of the start
                              ; of the string from the current position ($)
      ;; equivalently, we could have put a str_end: label after the string and done   len equ str_end - str
     

    请注意,我们不要将字符串长度存储在任何位置的数据存储器中.这是一个汇编时间常数,因此将其作为立即操作数比加载更为有效.我们也可以使用三个push imm32指令将字符串数据压入堆栈,但是过多地膨胀代码大小并不是一件好事.


    在Linux上,您可以将该文件另存为Hello.asm,并使用以下命令从其中构建32位可执行文件:

    nasm -felf32 Hello.asm                  # assemble as 32-bit code.  Add -Worphan-labels -g -Fdwarf  for debug symbols and warnings
    gcc -static -nostdlib -m32 Hello.o -o Hello     # link without CRT startup code or libc, making a static binary
    

    请参见此答案有关将程序​​集构建为32或64位静态或动态链接的Linux可执行文件的更多详细信息,以使用GNU as指令的NASM/YASM语法或GNU AT& T语法. (要点:在64位主机上构建32位代码时,请确保使用-m32或同等功能,否则在运行时会出现令人困惑的问题.)


    您可以使用strace跟踪其执行情况,以查看其执行的系统调用:

    $ strace ./Hello 
    execve("./Hello", ["./Hello"], [/* 72 vars */]) = 0
    [ Process PID=4019 runs in 32 bit mode. ]
    write(1, "Hello, world!\n", 14Hello, world!
    )         = 14
    _exit(0)                                = ?
    +++ exited with 0 +++
    

    将此与动态链接过程的跟踪进行比较(例如gcc从hello.c或从运行strace /bin/ls生成的链接),以了解在动态链接和C库启动的幕后发生了多少事情. /p>

    stderr上的跟踪和stdout上的常规输出都将到达此处的终端,因此它们会干扰write系统调用.如果需要,可以重定向或跟踪到文件.请注意,这使我们能够轻松地查看syscall返回值,而不必添加代码来打印它们,并且实际上比使用常规调试器(如gdb)单步执行并查看eax更容易.有关gdb asm提示,请参见 x86标签Wiki 的底部. (标记Wiki的其余部分包含指向良好资源的链接.)

    该程序的x86-64版本将非常相似,将相同的args传递给相同的系统调用,只是在不同的寄存器中并使用syscall而不是int 0x80.请参阅有关为Linux创建真正的TeenEL ELF可执行文件的旋风教程.您可以运行的最小二进制文件仅会执行exit()系统调用.那是关于最小化二进制大小,而不是源大小,甚至只是实际运行的指令数.

    1. I know that int 0x80 is making interrupt in linux. But, I don't understand how this code works. Does it returning something?

    2. What $ - msg standing for?

    global _start
    
    section .data
        msg db "Hello, world!", 0x0a
        len equ $ - msg
    
    section .text
    _start:
        mov eax, 4
        mov ebx, 1
        mov ecx, msg
        mov edx, len
        int 0x80 ;What is this?
        mov eax, 1
        mov ebx, 0
        int 0x80 ;and what is this?
    

    解决方案

    How does $ work in NASM, exactly? explains how $ - msg gets NASM to calculate the string length as an assemble-time constant for you, instead of hard-coding it.


    I originally wrote the rest of this for SO Docs (topic ID: 1164, example ID: 19078), rewriting a basic less-well-commented example by @runner. This looks like a better place to put it than as part of my answer to another question where I had previously moved it after the SO docs experiment ended.


    Making a system call is done by putting arguments into registers, then running int 0x80 (32-bit mode) or syscall (64-bit mode). What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 and The Definitive Guide to Linux System Calls.

    Think of int 0x80 as a way to "call" into the kernel, across the user/kernel privilege boundary. The kernel does stuff according to the values that were in registers when int 0x80 executed, then eventually returns. The return value is in EAX.

    When execution reaches the kernel's entry point, it looks at EAX and dispatches to the right system call based on the call number in EAX. Values from other registers are passed as function args to the kernel's handler for that system call. (e.g. eax=4 / int 0x80 will get the kernel to call its sys_write kernel function, implementing the POSIX write system call.)

    And see also What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - that answer includes a look at the asm in the kernel entry point that is "called" by int 0x80. (Also applies to 32-bit user-space, not just 64-bit where you shouldn't use int 0x80).


    If you don't already know low-level Unix systems programming, you might want to just write functions in asm that take args and return a value (or update arrays via a pointer arg) and call them from C or C++ programs. Then you can just worry about learning how to handle registers and memory, without also learning the POSIX system-call API and the ABI for using it. That also makes it very easy to compare your code with compiler output for a C implementation. Compilers usually do a pretty good job at making efficient code, but are rarely perfect.

    libc provides wrapper functions for system calls, so compiler-generated code would call write rather than invoking it directly with int 0x80 (or if you care about performance, sysenter). (In x86-64 code, use syscall for the 64-bit ABI.) See also syscalls(2).

    System calls are documented in section 2 manual pages, like write(2). See the NOTES section for differences between the libc wrapper function and the underlying Linux system call. Note that the wrapper for sys_exit is _exit(2), not the exit(3) ISO C function that flushes stdio buffers and other cleanup first. There's also an exit_group system call that ends all threads. exit(3) actually uses that, because there's no downside in a single-threaded process.

    This code makes 2 system calls:

    I commented it heavily (to the point where it it's starting to obscure the actual code without color syntax highlighting). This is an attempt to point things out to total beginners, not how you should comment your code normally.

    section .text             ; Executable code goes in the .text section
    global _start             ; The linker looks for this symbol to set the process entry point, so execution start here
    ;;;a name followed by a colon defines a symbol.  The global _start directive modifies it so it's a global symbol, not just one that we can CALL or JMP to from inside the asm.
    ;;; note that _start isn't really a "function".  You can't return from it, and the kernel passes argc, argv, and env differently than main() would expect.
     _start:
        ;;; write(1, msg, len);
        ; Start by moving the arguments into registers, where the kernel will look for them
        mov     edx,len       ; 3rd arg goes in edx: buffer length
        mov     ecx,msg       ; 2nd arg goes in ecx: pointer to the buffer
        ;Set output to stdout (goes to your terminal, or wherever you redirect or pipe)
        mov     ebx,1         ; 1st arg goes in ebx: Unix file descriptor. 1 = stdout, which is normally connected to the terminal.
    
        mov     eax,4         ; system call number (from SYS_write / __NR_write from unistd_32.h).
        int     0x80          ; generate an interrupt, activating the kernel's system-call handling code.  64-bit code uses a different instruction, different registers, and different call numbers.
        ;; eax = return value, all other registers unchanged.
    
        ;;;Second, exit the process.  There's nothing to return to, so we can't use a ret instruction (like we could if this was main() or any function with a caller)
        ;;; If we don't exit, execution continues into whatever bytes are next in the memory page,
        ;;; typically leading to a segmentation fault because the padding 00 00 decodes to  add [eax],al.
    
        ;;; _exit(0);
        xor     ebx,ebx       ; first arg = exit status = 0.  (will be truncated to 8 bits).  Zeroing registers is a special case on x86, and mov ebx,0 would be less efficient.
                          ;; leaving out the zeroing of ebx would mean we exit(1), i.e. with an error status, since ebx still holds 1 from earlier.
        mov     eax,1         ; put __NR_exit into eax
        int     0x80          ;Execute the Linux function
    
    section     .rodata       ; Section for read-only constants
    
                 ;; msg is a label, and in this context doesn't need to be msg:.  It could be on a separate line.
                 ;; db = Data Bytes: assemble some literal bytes into the output file.
    msg     db  'Hello, world!',0xa     ; ASCII string constant plus a newline (0x10)
    
                 ;;  No terminating zero byte is needed, because we're using write(), which takes a buffer + length instead of an implicit-length string.
                 ;; To make this a C string that we could pass to puts or strlen, we'd need a terminating 0 byte. (e.g. "...", 0x10, 0)
    
    len     equ $ - msg       ; Define an assemble-time constant (not stored by itself in the output file, but will appear as an immediate operand in insns that use it)
                              ; Calculate len = string length.  subtract the address of the start
                              ; of the string from the current position ($)
      ;; equivalently, we could have put a str_end: label after the string and done   len equ str_end - str
    

    Notice that we don't store the string length in data memory anywhere. It's an assemble-time constant, so it's more efficient to have it as an immediate operand than a load. We could also have pushed the string data onto the stack with three push imm32 instructions, but bloating the code-size too much isn't a good thing.


    On Linux, you can save this file as Hello.asm and build a 32-bit executable from it with these commands:

    nasm -felf32 Hello.asm                  # assemble as 32-bit code.  Add -Worphan-labels -g -Fdwarf  for debug symbols and warnings
    gcc -static -nostdlib -m32 Hello.o -o Hello     # link without CRT startup code or libc, making a static binary
    

    See this answer for more details on building assembly into 32 or 64-bit static or dynamically linked Linux executables, for NASM/YASM syntax or GNU AT&T syntax with GNU as directives. (Key point: make sure to use -m32 or equivalent when building 32-bit code on a 64-bit host, or you will have confusing problems at run-time.)


    You can trace its execution with strace to see the system calls it makes:

    $ strace ./Hello 
    execve("./Hello", ["./Hello"], [/* 72 vars */]) = 0
    [ Process PID=4019 runs in 32 bit mode. ]
    write(1, "Hello, world!\n", 14Hello, world!
    )         = 14
    _exit(0)                                = ?
    +++ exited with 0 +++
    

    Compare this with the trace for a dynamically linked process (like gcc makes from hello.c, or from running strace /bin/ls) to get an idea just how much stuff happens under the hood for dynamic linking and C library startup.

    The trace on stderr and the regular output on stdout are both going to the terminal here, so they interfere in the line with the write system call. Redirect or trace to a file if you care. Notice how this lets us easily see the syscall return values without having to add code to print them, and is actually even easier than using a regular debugger (like gdb) to single-step and look at eax for this. See the bottom of the x86 tag wiki for gdb asm tips. (The rest of the tag wiki is full of links to good resources.)

    The x86-64 version of this program would be extremely similar, passing the same args to the same system calls, just in different registers and with syscall instead of int 0x80. See the bottom of What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for a working example of writing a string and exiting in 64-bit code.


    related: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. The smallest binary file you can run that just makes an exit() system call. That is about minimizing the binary size, not the source size or even just the number of instructions that actually run.

    这篇关于您好,Linux系统调用的汇编语言世界吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆