UNIX&的调用约定是什么? i386和x86-64上的Linux系统调用 [英] What are the calling conventions for UNIX & Linux system calls on i386 and x86-64

查看:112
本文介绍了UNIX&的调用约定是什么? i386和x86-64上的Linux系统调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下链接说明了UNIX(BSD风格)和ampx的x86-32系统调用约定. Linux:

但是UNIX&上的x86-64系统调用约定是什么? Linux?

解决方案

此处的任何主题的进一步阅读: http://www. int80h.org/bsdasm/#alternate-calling-convention .使用int 0x80的i386 Linux的Hello World的另一个示例: 《 Linux系统调用的权威指南》 ,提供有关链接和调用vDSO的信息,有关sysenter的更多信息以及与系统调用有关的所有其他信息.)

x86-32 [Free | Open | Net | DragonFly] BSD UNIX系统调用约定:

参数在堆栈上传递.将参数(最后一个参数先被压入)推入堆栈.然后推送一个额外的32位伪数据(它实际上不是伪数据.有关更多信息,请参见下面的链接),然后给出系统调用指令int $0x80

http://www.int80h.org/bsdasm/#default-calling-约定


x86-64 Linux系统调用约定:

x86-64 Mac OS X相似但不同. TODO:检查* BSD的功能.

请参阅 Linux 内核约定"部分="noreferrer">系统V应用程序二进制接口AMD64体系结构处理器补充程序.可以在链接中找到最新版本的i386和x86-64 System V psABI.在ABI维护者的仓库中的此页面上. (有关最新信息,另请参见标签Wiki的问题日期的ABI链接以及有关x86 asm的许多其他好东西.)

这是本节的摘录:

  1. 用户级应用程序用作整数寄存器,用于传递 序列%rdi,%rsi,%rdx,%rcx, %r8和%r9. 内核界面使用%rdi,%rsi,%rdx,%r10,%r8和%r9.
  2. 通过 syscall指令完成系统调用.此相似之处%rcx和%r11 以及%rax返回值,但保留了其他寄存器.
  3. 系统调用的编号必须在寄存器%rax中传递.
  4. 系统调用仅限于六个参数,不传递任何参数 直接在堆栈上.
  5. 从系统调用返回,寄存器%rax包含以下结果: 系统调用.值介于-4095和-1之间表示 错误,它是-errno.
  6. 仅将INTEGER类或MEMORY类的值传递给内核.

请记住,这是从ABI的特定于Linux的附录中获取的,即使对于Linux,它的信息性也非规范性. (但实际上是准确的.)

此32位int $0x80 ABI 可用于64位代码(但强烈建议不要使用). 如果您在64位代码中使用32位int 0x80 Linux ABI,会发生什么情况?它仍将其输入截断为32位,因此不适合使用指针,并且将r8-r11归零.

用户界面:函数调用

x86-32函数调用约定:

在x86-32中,参数是在堆栈上传递的.最后一个参数首先被压入堆栈,直到完成所有参数,然后执行call指令.这用于在Linux上从程序集调用C库(libc)函数.

i386 System V ABI的现代版本(在Linux上使用)需要在call之前将%esp的16字节对齐,例如x86-64 System V ABI一直需要的.被调用者被允许假定并使用SSE 16字节加载/存储在未对齐时发生故障.但是从历史上看,Linux只需要4字节的堆栈对齐,因此即使对于8字节的double之类的东西,也要花费额外的工作来保留自然对齐的空间.

其他一些现代的32位系统仍然不需要超过4字节的堆栈对齐.


x86-64 System V用户空间函数调用约定:

x86-64 System V在寄存器中传递args,这比i386 System V的堆栈args约定效率更高.它避免了将args存储到内存(高速缓存)然后再将它们重新加载到被调用方中的等待时间和额外的指令.因为有更多可用的寄存器,所以此方法行之有效,并且对于延迟和无序执行至关重要的现代高性能CPU更好. (i386 ABI很旧).

在这种 new 机制中:首先,将参数划分为类.每个参数的类决定了将其传递给被调用函数的方式.

有关完整信息,请参见:系统V应用程序的"3.2函数调用序列"二进制接口AMD64体系结构处理器补编,部分内容为:

对参数进行分类后,便会分配寄存器(在 从左到右的顺序),如下所示:

  1. 如果该类是MEMORY,则将参数传递到堆栈上.
  2. 如果该类是INTEGER,则该类的下一个可用寄存器 使用了序列%rdi,%rsi,%rdx,%rcx,%r8和%r9

所以%rdi, %rsi, %rdx, %rcx, %r8 and %r9是按顺序 的寄存器,用于将整数/指针(即INTEGER类)参数传递给汇编中的任何libc函数. %rdi用于第一个INTEGER参数. %rsi代表第二,%rdx代表第三,依此类推.然后应给出call指令.执行call时,堆栈(%rsp)必须对齐16B.

如果有6个以上INTEGER参数,则将第7个INTEGER参数及更高版本传递给堆栈. (弹出呼叫者,与x86-32相同.)

前8个浮点args在%xmm0-7中传递,随后在堆栈中传递.没有保留呼叫的向量寄存器. (一个包含FP和整数参数的函数的寄存器总数可以超过8个.)

可变函数(printf )始终需要%al = FP寄存器args的数量.

对于何时将结构打包到寄存器(返回时为rdx:rax)与在内存中打包有一些规则.有关详细信息,请参见ABI,并检查编译器输出以确保您的代码与编译器有关如何传递/返回某些内容的约定.


请注意, Windows x64函数调用约定与x86-64 System V有多个显着不同,例如调用方必须保留 (而不是红色区域)的阴影空间以及保留呼叫的xmm6-xmm15 . arg进入哪个寄存器的规则也非常不同.

Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux:

But what are the x86-64 system call conventions on both UNIX & Linux?

解决方案

Further reading for any of the topics here: The Definitive Guide to Linux System Calls


I verified these using GNU Assembler (gas) on Linux.

Kernel Interface

x86-32 aka i386 Linux System Call convention:

In x86-32 parameters for Linux system call are passed using registers. %eax for syscall_number. %ebx, %ecx, %edx, %esi, %edi, %ebp are used for passing 6 parameters to system calls.

The return value is in %eax. All other registers (including EFLAGS) are preserved across the int $0x80.

I took following snippet from the Linux Assembly Tutorial but I'm doubtful about this. If any one can show an example, it would be great.

If there are more than six arguments, %ebx must contain the memory location where the list of arguments is stored - but don't worry about this because it's unlikely that you'll use a syscall with more than six arguments.

For an example and a little more reading, refer to http://www.int80h.org/bsdasm/#alternate-calling-convention. Another example of a Hello World for i386 Linux using int 0x80: What parts of this HelloWorld assembly code are essential if I were to write the program in assembly?

There is a faster way to make 32-bit system calls: using sysenter. The kernel maps a page of memory into every process (the vDSO), with the user-space side of the sysenter dance, which has to cooperate with the kernel for it to be able to find the return address. Arg to register mapping is the same as for int $0x80. You should normally call into the vDSO instead of using sysenter directly. (See The Definitive Guide to Linux System Calls for info on linking and calling into the vDSO, and for more info on sysenter, and everything else to do with system calls.)

x86-32 [Free|Open|Net|DragonFly]BSD UNIX System Call convention:

Parameters are passed on the stack. Push the parameters (last parameter pushed first) on to the stack. Then push an additional 32-bit of dummy data (Its not actually dummy data. refer to following link for more info) and then give a system call instruction int $0x80

http://www.int80h.org/bsdasm/#default-calling-convention


x86-64 Linux System Call convention:

x86-64 Mac OS X is similar but different. TODO: check what *BSD does.

Refer to section: "A.2 AMD64 Linux Kernel Conventions" of System V Application Binary Interface AMD64 Architecture Processor Supplement. The latest versions of the i386 and x86-64 System V psABIs can be found linked from this page in the ABI maintainer's repo. (See also the tag wiki for up-to-date ABI links and lots of other good stuff about x86 asm.)

Here is the snippet from this section:

  1. User-level applications use as integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.
  2. A system-call is done via the syscall instruction. This clobbers %rcx and %r11 as well as the %rax return value, but other registers are preserved.
  3. The number of the syscall has to be passed in register %rax.
  4. System-calls are limited to six arguments, no argument is passed directly on the stack.
  5. Returning from the syscall, register %rax contains the result of the system-call. A value in the range between -4095 and -1 indicates an error, it is -errno.
  6. Only values of class INTEGER or class MEMORY are passed to the kernel.

Remember this is from the Linux-specific appendix to the ABI, and even for Linux it's informative not normative. (But it is in fact accurate.)

This 32-bit int $0x80 ABI is usable in 64-bit code (but highly not recommended). What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? It still truncates its inputs to 32-bit, so it's unsuitable for pointers, and it zeros r8-r11.

User Interface: function calling

x86-32 Function Calling convention:

In x86-32 parameters were passed on stack. Last parameter was pushed first on to the stack until all parameters are done and then call instruction was executed. This is used for calling C library (libc) functions on Linux from assembly.

Modern versions of the i386 System V ABI (used on Linux) require 16-byte alignment of %esp before a call, like the x86-64 System V ABI has always required. Callees are allowed to assume that and use SSE 16-byte loads/stores that fault on unaligned. But historically, Linux only required 4-byte stack alignment, so it took extra work to reserve naturally-aligned space even for an 8-byte double or something.

Some other modern 32-bit systems still don't require more than 4 byte stack alignment.


x86-64 System V user-space Function Calling convention:

x86-64 System V passes args in registers, which is more efficient than i386 System V's stack args convention. It avoids the latency and extra instructions of storing args to memory (cache) and then loading them back again in the callee. This works well because there are more registers available, and is better for modern high-performance CPUs where latency and out-of-order execution matter. (The i386 ABI is very old).

In this new mechanism: First the parameters are divided into classes. The class of each parameter determines the manner in which it is passed to the called function.

For complete information refer to : "3.2 Function Calling Sequence" of System V Application Binary Interface AMD64 Architecture Processor Supplement which reads, in part:

Once arguments are classified, the registers get assigned (in left-to-right order) for passing as follows:

  1. If the class is MEMORY, pass the argument on the stack.
  2. If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used

So %rdi, %rsi, %rdx, %rcx, %r8 and %r9 are the registers in order used to pass integer/pointer (i.e. INTEGER class) parameters to any libc function from assembly. %rdi is used for the first INTEGER parameter. %rsi for 2nd, %rdx for 3rd and so on. Then call instruction should be given. The stack (%rsp) must be 16B-aligned when call executes.

If there are more than 6 INTEGER parameters, the 7th INTEGER parameter and later are passed on the stack. (Caller pops, same as x86-32.)

The first 8 floating point args are passed in %xmm0-7, later on the stack. There are no call-preserved vector registers. (A function with a mix of FP and integer arguments can have more than 8 total register arguments.)

Variadic functions (like printf) always need %al = the number of FP register args.

There are rules for when to pack structs into registers (rdx:rax on return) vs. in memory. See the ABI for details, and check compiler output to make sure your code agrees with compilers about how something should be passed/returned.


Note that the Windows x64 function calling convention has multiple significant differences from x86-64 System V, like shadow space that must be reserved by the caller (instead of a red-zone), and call-preserved xmm6-xmm15. And very different rules for which arg goes in which register.

这篇关于UNIX&的调用约定是什么? i386和x86-64上的Linux系统调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆