在扩展内联ASM中调用printf [英] Calling printf in extended inline ASM

查看:265
本文介绍了在扩展内联ASM中调用printf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在64位Linux上的 GCC 中的扩展内联ASM中输出相同的字符串两次。

  int main()
{
const char * test =test\\\
;
$ b asm(
movq%[test],%% rdi\\\
//调试器显示rdi = *字符串地址*
movq $ 0,%% rax \\\


push %% rbp\\\

push %% rbx\\\

call printf\\\

pop %% rbx\\\

pop %% rbp\\\


movq%[test],%% rdi\\\
//调试器显示rdi = 0
movq $ 0,%% rax\\\


push %% rbp\\\

push %% rbx\\\

调用printf\\\

pop %% rbx\\\

pop %% rbp\\\


:[test]g(test )
:rax,rbx,rcx,rdx,rdi,rsi,rsp
);

返回0;
}

现在,字符串只输出一次。我尝试了很多东西,但我想我错过了关于调用约定的一些警告。我甚至不确定clobber列表是否正确,或者我是否需要保存和恢复 RBP RBX



为什么字符串没有输出两次?



用调试器看一看,当某个字符串被加载到 rdi 第二次具有值 0 而不是字符串的实际地址。

我无法解释为什么,看起来在第一次调用堆栈后会损坏?我需要以某种方式恢复它吗?

解决方案

您的代码存在特定问题: RDI 没有在一个函数调用中维护(见下文)。在第一次调用 printf 但被 printf 打乱之前是正确的。您需要先将其暂时存储在别处。一个没有被破坏的寄存器将会很方便。您可以在 printf 之前保存副本,然后将其复制回 RDI

< hr>

我不建议按照您的建议进行操作(在内联汇编器中进行函数调用)。编译器很难优化。

其中, 64位系统V ABI 规定了一个128字节的红色区域。这意味着如果没有潜在的损坏,你不能将任何东西推入堆栈。请记住:执行 CALL 会在堆栈上压入返回地址。解决这个问题的快速和肮脏的方法是,当内联汇编器启动时,从 RSP 中减去128,然后在完成时添加128.


%rsp指向的位置之外的128字节区域被认为是
被保留,不应该被信号或中断处理程序修改.8因此,
函数可能会使用这个在功能
呼叫中不需要的临时数据区域。特别是,叶函数可以在整个栈帧中使用这个区域
,而不是在序言和尾声中调整堆栈指针。这个区域被称为红色区域

另一个值得关注的问题是要求堆栈是在任何函数调用之前,对齐16字节(或可能根据参数对齐32字节)。这也是64位ABI所必需的:


输入参数区域的末尾应与16位(32位,如果__m256是
传入堆栈)字节边界。换句话说,当控制权转移到函数入口点时,值(%rsp + 8)总是
a 16(32)倍。

注意:对于 CALL 的16字节对齐函数的这种要求也是 GCC > = 4.5:


$ b $在32位Linux上需要.org / wiki / X86_calling_conventions#cdeclrel =nofollow noreferrer b


在C编程语言的上下文中,函数参数以相反的顺序压入堆栈。在Linux中,GCC设置了调用约定的事实标准。自GCC版本4.5以来,调用函数时堆栈必须与16字节的边界对齐(以前的版本只需要4字节的对齐方式)。

由于我们在内联汇编程序中调用 printf ,因此我们应该确保在调用之前将堆栈与16字节边界对齐。



您还必须知道,在调用函数时,某些函数调用会保留一些寄存器,而有些则不会。具体来说,那些可能被函数调用破坏的函数在64位ABI的图3.4中列出(请参阅上一个链接)。这些寄存器是 RAX RCX RDX RD8 - RD11 XMM0 - XMM15 MMX0 - MMX7 ST0 - ST7 >。这些都可能被销毁,所以如果它们没有出现在输入和输出约束中,应该放在clobber列表中。

以下代码应该满足大部分条件以确保调用另一个函数的内联汇编程序不会无意中破坏寄存器,保留redzone并在调用之前保持16字节的对齐方式:

  int main()
{
const char * test =test\\\
;
long dummyreg; / * dummyreg允许GCC选择可用的寄存器* /

__asm__ __volatile__(
add $ -128,%% rsp\\\
\t/ *跳过当前的redzone * /
mov %% rsp,%[temp] \\\
\t/ *将RSP复制到可用的寄存器* /
和$ -16,%% rsp\\\
\ t/ *将堆栈与16字节边界对齐* /
mov%[test],%% rdi \\\
\t/ * RDI是字符串的地址* /
xor% %eax,%% eax\\\
\t/ *变量函数集AL。这种情况下0 * /
call printf\\\
\t
mov%[test], %% rdi\\\
\t/ * RDI是字符串的地址* /
xor %% eax,%% eax\\\
\t/ *变量函数集AL。 * /
call printf\\\
'\\t
mov%[temp],%% rsp\\\
\t/ * Restore RSP * /
sub $ -128,%% rsp\\\
\t/ *将128添加到RSP以恢复到原* /
:[temp]=& r(dummyreg)/ *允许GCC选择有效在所有输入消耗之前修改
以便使用&对于早期的clobber * /
:[test]r(test),/ *选择可用的寄存器作为输入操作数* /
m(test)/ *虚拟约束确保测试数组$在内联
汇编执行之前,b $ b完全在内存中实现* /
:rax,rcx,rdx,rsi,rdi,r8,r9 ,r10,r11,
xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7,
xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15,
mm0,mm1,mm2,mm3 mm2mm6mm6
stst(1)st(2)st(3)st( 4),st(5),st(6),st(7)
);

返回0;
}

我使用输入约束来允许模板选择一个可用的寄存器用于传递 str 地址。这确保我们有一个寄存器来存储调用 printf 之间的 str 地址。我还通过使用一个虚拟寄存器,让汇编程序模板临时选择一个可用的位置来存储 RSP 。所选择的寄存器将不包括任何已经被选择/列为输入/输出/触发操作数的寄存器。

这看起来非常混乱,但如果程序变得越来越复杂,如果没有正确地做到,可能会导致问题。这就是为什么在内联汇编器中调用符合System V 64位ABI的函数通常不是最好的做法。


I'm trying to output the same string twice in extended inline ASM in GCC, on 64-bit Linux.

int main()
{
    const char* test = "test\n";

    asm(
        "movq %[test], %%rdi\n"    // Debugger shows rdi = *address of string*  
        "movq $0, %%rax\n"

        "push %%rbp\n"
        "push %%rbx\n"
        "call printf\n"         
        "pop %%rbx\n"
        "pop %%rbp\n"

        "movq %[test], %%rdi\n" // Debugger shows rdi = 0
        "movq $0, %%rax\n"

        "push %%rbp\n"
        "push %%rbx\n"
        "call printf\n"     
        "pop %%rbx\n"
        "pop %%rbp\n"
        : 
        :  [test] "g" (test)
        : "rax", "rbx","rcx", "rdx", "rdi", "rsi", "rsp"
        );

    return 0;
}

Now, the string is outputted only once. I have tried many things, but I guess I am missing some caveats about the calling convention. I'm not even sure if the clobber list is correct or if I need to save and restore RBP and RBX at all.

Why is the string not outputted twice?

Looking with a debugger shows me that somehow when the string is loaded into rdi for the second time it has the value 0 instead of the actual address of the string.

I cannot explain why, it seems like after the first call the stack is corrupted? Do I have to restore it in some way?

解决方案

Specific problem to your code: RDI is not maintained across a function call (see below). It is correct before the first call to printf but is clobbered by printf. You'll need to temporarily store it elsewhere first. A register that isn't clobbered will be convenient. You can then save a copy before printf, and copy it back to RDI after.


I do not recommend doing what you are suggesting (making function calls in inline assembler). It will be very difficult for the compiler to optimize things.

Among other things the 64-bit System V ABI mandates a 128-byte red zone. That means you can't push anything onto the stack without potential corruption. Remember: doing a CALL pushes a return address on the stack. Quick and dirty way to resolve this problem is to subtract 128 from RSP when your inline assembler starts and then add 128 back when finished.

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers.8 Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

Another issue to be concerned about is the requirement for the stack to be 16-byte aligned (or possibly 32-byte aligned depending on the parameters) prior to any function call. This is required by the 64-bit ABI as well:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point.

Note: This requirement for 16-byte alignment upon a CALL to a function is also required on 32-bit Linux for GCC >= 4.5:

In context of the C programming language, function arguments are pushed on the stack in the reverse order. In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary when calling a function (previous versions only required a 4-byte alignment.)

Since we call printf in inline assembler we should ensure that we align the stack to a 16-byte boundary before making the call.

You also have to be aware that when calling a function some registers are preserved across a function call and some are not. Specifically those that may be clobbered by a function call are listed in Figure 3.4 of the 64-bit ABI (see previous link). Those registers are RAX, RCX, RDX, RD8-RD11, XMM0-XMM15, MMX0-MMX7, ST0-ST7 . These are all potentially destroyed so should be put in the clobber list if they don't appear in the input and output constraints.

The following code should satisfy most of the conditions to ensure that inline assembler that calls another function will not inadvertently clobber registers, preserves the redzone, and maintains 16-byte alignment before a call:

int main()
{
    const char* test = "test\n";
    long dummyreg; /* dummyreg used to allow GCC to pick available register */

    __asm__ __volatile__ (
        "add $-128, %%rsp\n\t"   /* Skip the current redzone */
        "mov %%rsp, %[temp]\n\t" /* Copy RSP to available register */
        "and $-16, %%rsp\n\t"    /* Align stack to 16-byte boundary */
        "mov %[test], %%rdi\n\t" /* RDI is address of string */
        "xor %%eax, %%eax\n\t"   /* Variadic function set AL. This case 0 */
        "call printf\n\t"
        "mov %[test], %%rdi\n\t" /* RDI is address of string again */
        "xor %%eax, %%eax\n\t"   /* Variadic function set AL. This case 0 */
        "call printf\n\t"
        "mov %[temp], %%rsp\n\t" /* Restore RSP */
        "sub $-128, %%rsp\n\t"   /* Add 128 to RSP to restore to orig */
        :  [temp]"=&r"(dummyreg) /* Allow GCC to pick available output register. Modified
                                    before all inputs consumed so use & for early clobber*/
        :  [test]"r"(test),      /* Choose available register as input operand */
           "m"(test)             /* Dummy constraint to make sure test array
                                    is fully realized in memory before inline
                                    assembly is executed */
        : "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11",
          "xmm0","xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
          "xmm8","xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
          "mm0","mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm6",
          "st", "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)"
        );

    return 0;
}

I used an input constraint to allow the template to choose an available register to be used to pass the str address through. This ensures that we have a register to store the str address between the calls to printf. I also get the assembler template to choose an available location for storing RSP temporarily by using a dummy register. The registers chosen will not include any one already chosen/listed as an input/output/clobber operand.

This looks very messy, but failure to do it correctly could lead to problems later as you program becomes more complex. This is why calling functions that conform to the System V 64-bit ABI within inline assembler is generally not the best way to do things.

这篇关于在扩展内联ASM中调用printf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆