汇编 - 将参数传递给函数调用 [英] Assembly - Passing parameters to a function call

查看:37
本文介绍了汇编 - 将参数传递给函数调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在通过反汇编 C 程序并试图理解它们的作用来练习汇编阅读.

I am currently practicing with assembly reading by disassemblying C programs and trying to understand what they do.

我被一个小问题困住了:一个简单的 hello world 程序.

I am stuck with a trivial one: a simple hello world program.

#include <stdio.h>
#include <stdlib.h>

int main() {
  printf("Hello, world!");
  return(0);
}

当我拆卸主体时:

(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000400526 <+0>: push   rbp
   0x0000000000400527 <+1>: mov    rbp,rsp
   0x000000000040052a <+4>: mov    edi,0x4005c4
   0x000000000040052f <+9>: mov    eax,0x0
   0x0000000000400534 <+14>:    call   0x400400 <printf@plt>
   0x0000000000400539 <+19>:    mov    eax,0x0  
   0x000000000040053e <+24>:    pop    rbp
   0x000000000040053f <+25>:    ret

前两行我明白了:基指针保存在栈上(通过push rbp,导致栈指针的值减8,因为已经增长"了)和栈上的值指针保存在基指针中(这样参数和局部变量可以分别通过正负偏移轻松到达,同时堆栈可以保持增长").

I understand the first two lines: the base pointer is saved on the stack (by push rbp, which causes the value of the stack pointer to be decreased by 8, because it has "grown") and the value of the stack pointer is saved in the base pointer (so that parameters and local variable can be easily reached through positive and negative offsets, respectively, while the stack can keep "growing").

第三行提出了第一个问题:为什么 0x4005c4(Hello, World!"字符串的地址)在 edi 寄存器中移动而不是在堆栈中移动?printf 函数不应该将该字符串的地址作为参数吗?据我所知,函数从堆栈中获取参数(但在这里,参数看起来像是放在该寄存器中:edi)

The third line presents the first issue: why is 0x4005c4 (the address of the "Hello, World!" string) moved in the edi register instead of moving it on the stack? Shouldn't the printf function take the address of that string as parameter? For what I know, functions take parameters from the stack (but here, it looks like the parameter is put in that register: edi)

在 StackOverflow 上的另一篇文章中,我读到printf@ptl"就像一个调用真正 printf 函数的存根函数.我试图反汇编该函数,但它变得更加混乱:

On another post here on StackOverflow I read that "printf@ptl" is like a stub function that calls the real printf function. I tried to disassemble that function, but it gets even more confusing:

(gdb) disassemble printf
Dump of assembler code for function __printf:
   0x00007ffff7a637b0 <+0>: sub    rsp,0xd8
   0x00007ffff7a637b7 <+7>: test   al,al
   0x00007ffff7a637b9 <+9>: mov    QWORD PTR [rsp+0x28],rsi
   0x00007ffff7a637be <+14>:    mov    QWORD PTR [rsp+0x30],rdx
   0x00007ffff7a637c3 <+19>:    mov    QWORD PTR [rsp+0x38],rcx
   0x00007ffff7a637c8 <+24>:    mov    QWORD PTR [rsp+0x40],r8
   0x00007ffff7a637cd <+29>:    mov    QWORD PTR [rsp+0x48],r9
   0x00007ffff7a637d2 <+34>:    je     0x7ffff7a6380b <__printf+91>
   0x00007ffff7a637d4 <+36>:    movaps XMMWORD PTR [rsp+0x50],xmm0
   0x00007ffff7a637d9 <+41>:    movaps XMMWORD PTR [rsp+0x60],xmm1
   0x00007ffff7a637de <+46>:    movaps XMMWORD PTR [rsp+0x70],xmm2
   0x00007ffff7a637e3 <+51>:    movaps XMMWORD PTR [rsp+0x80],xmm3
   0x00007ffff7a637eb <+59>:    movaps XMMWORD PTR [rsp+0x90],xmm4
   0x00007ffff7a637f3 <+67>:    movaps XMMWORD PTR [rsp+0xa0],xmm5
   0x00007ffff7a637fb <+75>:    movaps XMMWORD PTR [rsp+0xb0],xmm6
   0x00007ffff7a63803 <+83>:    movaps XMMWORD PTR [rsp+0xc0],xmm7
   0x00007ffff7a6380b <+91>:    lea    rax,[rsp+0xe0]
   0x00007ffff7a63813 <+99>:    mov    rsi,rdi
   0x00007ffff7a63816 <+102>:   lea    rdx,[rsp+0x8]
   0x00007ffff7a6381b <+107>:   mov    QWORD PTR [rsp+0x10],rax
   0x00007ffff7a63820 <+112>:   lea    rax,[rsp+0x20]
   0x00007ffff7a63825 <+117>:   mov    DWORD PTR [rsp+0x8],0x8
   0x00007ffff7a6382d <+125>:   mov    DWORD PTR [rsp+0xc],0x30
   0x00007ffff7a63835 <+133>:   mov    QWORD PTR [rsp+0x18],rax
   0x00007ffff7a6383a <+138>:   mov    rax,QWORD PTR [rip+0x36d70f]        # 0x7ffff7dd0f50
   0x00007ffff7a63841 <+145>:   mov    rdi,QWORD PTR [rax]
   0x00007ffff7a63844 <+148>:   call   0x7ffff7a5b130 <_IO_vfprintf_internal>
   0x00007ffff7a63849 <+153>:   add    rsp,0xd8
   0x00007ffff7a63850 <+160>:   ret    
End of assembler dump.

eax (mov eax, 0x0) 上的两个 mov 操作也让我有点困扰,因为我不明白它们在这里的作用(但我更关心我刚刚描述的内容).提前致谢.

The two mov operations on eax (mov eax, 0x0) bother me a little as well, since I don't get they role in here (but I am more concerned with what I have just described). Thank you in advance.

推荐答案

gcc 的目标是 x86-64 System V ABI,被除 Windows 之外的所有 x86-64 系统使用(对于 各种历史原因).它的调用约定在回退到堆栈之前传递寄存器中的前几个参数.(另请参阅此调用约定的维基百科基本摘要.)

gcc is targeting the x86-64 System V ABI, used by all x86-64 systems other than Windows (for various historical reasons). Its calling convention passes the first few args in registers before falling back to the stack. (See also the Wikipedia basic summary of this calling convention.)

是的,这与将堆栈用于一切的硬壳旧的 32 位调用约定不同.这是一件好事.另请参阅 标签维基以获取更多 ABI 链接文档,以及大量其他内容.

And yes, this is different from the crusty old 32-bit calling conventions that use the stack for everything. This is a Good Thing. See also the x86 tag wiki for more links to ABI docs, and tons of other stuff.

   0x0000000000400526: push   rbp
   0x0000000000400527: mov    rbp,rsp         # stack-frame boilerplate
   0x000000000040052a: mov    edi,0x4005c4    # first arg
   0x000000000040052f: mov    eax,0x0         # 0 FP args in vector registers
   0x0000000000400534: call   0x400400 <printf@plt>
   0x0000000000400539: mov    eax,0x0         # return 0.  If you'd compiled with optimization, this and the previous mov would be  xor eax,eax
   0x000000000040053e: pop    rbp             # clean up stack frame
   0x000000000040053f: ret

指向静态数据的指针适合 32 位,这就是为什么它可以使用 mov edi, imm32 而不是 movabs rdi, imm64.

Pointers to static data fit into 32 bits, which is why it can use mov edi, imm32 instead of movabs rdi, imm64.

浮点参数在 SSE 寄存器 (xmm0-xmm7) 中传递,甚至传递给 var-args 函数.al 表示向量寄存器中有多少 FP 参数.(请注意,C 的类型提升规则意味着对可变参数函数的 float args 始终提升为 double,这就是 printf 没有任何用于 float 的格式说明符的原因,只有 doublelong double).

Floating-point args are passed in SSE registers (xmm0-xmm7), even to var-args functions. al indicates how many FP args are in vector registers. (Note that C's type promotion rules mean that float args to variadic functions are always promoted to double, which is why printf doesn't have any format specifiers for float, only double and long double).

printf@ptl 就像一个调用真正的 printf 函数的存根函数.

printf@ptl is like a stub function that calls the real printf function.

是的,没错.过程链接表条目以 jmp 开始,指向动态链接器例程,该例程解析符号并修改 PLT 中的代码以将其直接转换为指向地址的 jmp映射 libc 的 printf 定义的位置.printf__printf 的弱别名,这就是为什么 gdb 在您要求反汇编 后为该地址选择 __printf 标签的原因printf.

Yes, that's right. The Procedure Linking Table entry starts out as a jmp to a dynamic linker routine that resolves the symbol and modifies the code in the PLT to turn it into a jmp directly to the address where libc's printf definition is mapped. printf is a weak alias for __printf, which is why gdb chooses the __printf label for that address, after you asked for disassembly of printf.

Dump of assembler code for function __printf:
   0x00007ffff7a637b0 <+0>: sub    rsp,0xd8               # reserve space
   0x00007ffff7a637b7 <+7>: test   al,al                  # check if there were any FP args
   0x00007ffff7a637b9 <+9>: mov    QWORD PTR [rsp+0x28],rsi  # store the integer arg-passing registers to local scratch space
   0x00007ffff7a637be <+14>:    mov    QWORD PTR [rsp+0x30],rdx
   0x00007ffff7a637c3 <+19>:    mov    QWORD PTR [rsp+0x38],rcx
   0x00007ffff7a637c8 <+24>:    mov    QWORD PTR [rsp+0x40],r8
   0x00007ffff7a637cd <+29>:    mov    QWORD PTR [rsp+0x48],r9
   0x00007ffff7a637d2 <+34>:    je     0x7ffff7a6380b <__printf+91>  # skip storing the FP arg-passing regs if there were no FP args
   0x00007ffff7a637d4 <+36>:    movaps XMMWORD PTR [rsp+0x50],xmm0
   0x00007ffff7a637d9 <+41>:    movaps XMMWORD PTR [rsp+0x60],xmm1
   0x00007ffff7a637de <+46>:    movaps XMMWORD PTR [rsp+0x70],xmm2
   0x00007ffff7a637e3 <+51>:    movaps XMMWORD PTR [rsp+0x80],xmm3
   0x00007ffff7a637eb <+59>:    movaps XMMWORD PTR [rsp+0x90],xmm4
   0x00007ffff7a637f3 <+67>:    movaps XMMWORD PTR [rsp+0xa0],xmm5
   0x00007ffff7a637fb <+75>:    movaps XMMWORD PTR [rsp+0xb0],xmm6
   0x00007ffff7a63803 <+83>:    movaps XMMWORD PTR [rsp+0xc0],xmm7
       branch_target_from_test_je:
   0x00007ffff7a6380b <+91>:    lea    rax,[rsp+0xe0]            # some more stuff

所以 printf 的实现通过将所有 arg-passing 寄存器(除了第一个保存格式字符串的寄存器)存储到本地数组来保持 var-args 处理简单.它可以遍历它们而不需要类似开关的代码来提取正确的整数或 FP arg.它仍然需要跟踪前 5 个整数和前 8 个 FP 参数,因为它们与调用者压入堆栈的其余参数不连续.

So printf's implementation keeps the var-args handling simple by storing all the arg-passing registers (except the first one holding the format string) in order to local arrays. It can walk a pointer through them instead of needing switch-like code to extract the right integer or FP arg. It still needs to keep track of the first 5 integer and first 8 FP args, because they aren't contiguous with the rest of the args pushed by the caller onto the stack.

Windows 64 位调用约定的影子空间通过为函数提供空间来简化此操作将其寄存器 args 转储到与堆栈中已有的 args 相邻的堆栈中,但这不值得在每次调用时浪费 32 字节的堆栈,IMO.(请参阅我对 为什么 Windows64 使用与 x86-64 上的所有其他操作系统不同的调用约定?)

The Windows 64-bit calling convention's shadow space simplifies this by providing space for a function to dump its register args to the stack contiguous with the args already on the stack, but that's not worth wasting 32 bytes of stack on every call, IMO. (See my answer and comments on other answers on Why does Windows64 use a different calling convention from all other OSes on x86-64?)

这篇关于汇编 - 将参数传递给函数调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆