带有堆栈操作的GCC内联组件 [英] GCC inline assembly with stack operation

查看:70
本文介绍了带有堆栈操作的GCC内联组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要这样的内联汇编代码:

I am in need of such a inline assembly code:

  • 我在装配体中有一对(因此很平衡)推/弹出操作
  • 我的内存中还有一个变量(因此,不是寄存器)作为输入
  • I have a pair(so, it is balanced) of push/pop operation inside the assembly
  • I also have a variable in memory (so, not register) as input

像这样:

__asm__ __volatile__ ("push %%eax\n\t"
        // ... some operations that use ECX as a temporary
        "mov %0, %%ecx\n\t"
        // ... some other operation
        "pop %%eax"
: : "m"(foo));
// foo is my local variable, that is to say, on stack

反汇编编译后的代码时,编译器给出的内存地址类似于0xc(%esp),它是相对于esp的,因此,由于我在mov之前进行了push操作,因此该代码段无法正常工作. 因此,如何告诉编译器,我不喜欢相对于espfoo,而是相对于ebp的-8(%ebp)之类的东西.

When disassembling the compiled code, the compiler give the memory address like 0xc(%esp), it is relative to esp, hence, this fragment of code will not works correctly since I have a push operation before mov. Therefore, how can I tell the compile I do not like the foo relative to esp, but any thing like -8(%ebp) relative to ebp.

P.S.您可能会建议我可以将eax放在Clobbers中,但这只是示例代码.我不想显示不接受此解决方案的全部原因.

P.S. You may suggest that I can put eax inside the Clobbers, but it is just a sample code. I don't like to show the full reason why I don't accept this solution.

推荐答案

当您有任何内存输入/输出时,通常应该避免在inline-asm中修改ESP,因此不必禁用优化或强制编译器执行以下操作:用EBP用其他方法制作一个堆栈框架.一个主要优点是,您(或编译器)随后可以将EBP用作额外的免费寄存器;如果您已经不得不溢出/重新装载东西,则可能会显着提高速度.如果您正在编写内联汇编,则大概是一个热点,因此值得花额外的代码大小来使用ESP相对寻址模式.

Modifying ESP inside inline-asm should generally be avoided when you have any memory inputs / outputs, so you don't have to disable optimizations or force the compiler to make a stack-frame with EBP some other way. One major advantage is that you (or the compiler) can then use EBP as an extra free register; potentially a significant speedup if you're already having to spill/reload stuff. If you're writing inline asm, presumably this is a hotspot so it's worth spending the extra code-size to use ESP-relative addressing modes.

在x86-64代码中,安全使用push/pop还有一个障碍,因为类似,您可以在其中破坏编译器的数据在堆栈上.但是,没有32位x86 ABI带有红色区域,因此这仅适用于x86-64 SystemV.(或带有红色区域的非x86 ISA).

In x86-64 code, there's an added obstacle to using push/pop safely, because you can't tell the compiler you want to clobber the red-zone below RSP. (You can compile with -mno-red-zone, but there's no way to disable it from the C source.) You can get problems like this where you clobber the compiler's data on the stack. No 32-bit x86 ABI has a red-zone, though, so this only applies to x86-64 System V. (Or non-x86 ISAs with a red-zone.)

如果您想将像push这样的仅asm的东西用作堆栈数据结构,则只需要该函数的-fno-omit-frame-pointer,因此存在可变数量的推送.或者也许是针对代码大小进行优化.

You only need -fno-omit-frame-pointer for that function if you want to do asm-only stuff like push as a stack data structure, so there's a variable amount of push. Or maybe if optimizing for code-size.

您始终可以在asm中编写整个非内联函数并将其放在单独的文件中,这样您就可以完全控制.但是只有在您的函数足够大以值得调用/重载开销的情况下才这样做,例如如果包含整个循环;不要在C内部循环中使编译器call成为一个简短的非循环函数,而销毁所有被调用阻塞的寄存器,并且必须确保全局同步.

You can always write a whole non-inline function in asm and put it in a separate file, then you have full control. But only do that if your function is large enough to be worth the call/ret overhead, e.g. if it includes a whole loop; don't make the compiler call a short non-looping function inside a C inner loop, destroying all the call-clobbered registers and having to make sure globals are in sync.

似乎您正在内联汇编中使用push/pop,因为您没有足够的寄存器,并且需要保存/重新加载某些内容. 您无需使用推入/弹出进行保存/恢复.相反,请使用具有"=m"约束的伪输出操作数来使编译器为您分配堆栈空间,并在这些插槽之间使用mov. (当然,您不仅限于mov;如果您只需要一次或两次使用该值,那么将存储源操作数用于ALU指令可能是一个胜利.)

It seems you're using push / pop inside inline asm because you don't have enough registers, and need to save/reload something. You don't need to use push/pop for save/restore. Instead, use dummy output operands with "=m" constraints to get the compiler to allocate stack space for you, and use mov to/from those slots. (Of course you're not limited to mov; it can be a win to use a memory source operand for an ALU instruction if you only need the value once or twice.)

对于代码大小而言,这可能会稍差一些,但通常不会使性能变差(并且可能会更好).如果那还不够好,请在asm中编写整个函数(或整个循环),这样您就不必为编译器费力了.

This may be slightly worse for code-size, but is usually not worse for performance (and can be better). If that's not good enough, write the whole function (or the whole loop) in asm so you don't have to wrestle with the compiler.

int foo(char *p, int a, int b) {
    int t1,t2;  // dummy output spill slots
    int r1,r2;  // dummy output tmp registers
    int res;

    asm ("# operands: %0  %1  %2  %3  %4  %5  %6  %7  %8\n\t"
         "imull  $123, %[b], %[res]\n\t"
         "mov   %[res], %[spill1]\n\t"
         "mov   %[a], %%ecx\n\t"
         "mov   %[b], %[tmp1]\n\t"  // let the compiler allocate tmp regs, unless you need specific regs e.g. for a shift count
         "mov   %[spill1], %[res]\n\t"
    : [res] "=&r" (res),
      [tmp1] "=&r" (r1), [tmp2] "=&r" (r2),  // early-clobber
      [spill1] "=m" (t1), [spill2] "=&rm" (t2)  // allow spilling to a register if there are spare regs
      , [p] "+&r" (p)
      , "+m" (*(char (*)[]) p) // dummy in/output instead of memory clobber
    : [a] "rmi" (a), [b] "rm" (b)  // a can be an immediate, but b can't
    : "ecx"
    );

    return res;

    // p unused in the rest of the function
    // so it's really just an input to the asm,
    // which the asm is allowed to destroy
}

此编译以下ASM与 3D%3D"相对= nofollow noreferrer">.请注意asm注释,其中显示了编译器为所有模板操作数选择的内容:它为%[spill1]选择了12(%esp),为%[spill2]选择了%edi(因为我为该操作数使用了"=&rm",因此编译器保存/恢复了%edi在asm之外,并将其交给我们作为该虚拟操作数).

This compiles to the following asm with gcc7.3 -O3 -m32 on the Godbolt compiler explorer. Note the asm-comment showing what the compiler picked for all the template operands: it picked 12(%esp) for %[spill1] and %edi for %[spill2] (because I used "=&rm" for that operand, so the compiler saved/restore %edi outside the asm, and gave it to us for that dummy operand).

foo(char*, int, int):
    pushl   %ebp
    pushl   %edi
    pushl   %esi
    pushl   %ebx
    subl    $16, %esp
    movl    36(%esp), %edx
    movl    %edx, %ebp
#APP
# 19 "/tmp/compiler-explorer-compiler118120-55-w92ge8.v797i/example.cpp" 1
        # operands: %eax  %ebx  %esi  12(%esp)  %edi  %ebp  (%edx)  40(%esp)  44(%esp)
    imull  $123, 44(%esp), %eax
    mov   %eax, 12(%esp)
    mov   40(%esp), %ecx
    mov   44(%esp), %ebx
    mov   12(%esp), %eax

# 0 "" 2
#NO_APP
    addl    $16, %esp
    popl    %ebx
    popl    %esi
    popl    %edi
    popl    %ebp
    ret

嗯,用来告诉编译器我们修改了哪个内存的伪内存操作数似乎导致专用于该寄存器,我想是因为p操作数是早期缓冲区,因此它不能使用相同的寄存器.我想如果您确信其他输入都不会使用与p相同的寄存器,则可能会冒着丢掉早期消息的风险. (即它们没有相同的值).

Hmm, the dummy memory operand to tell the compiler which memory we modify seems to have resulted in dedicating a register to that, I guess because the p operand is early-clobber so it can't use the same register. I guess you could risk leaving off the early-clobber if you're confident none of the other inputs will use the same register as p. (i.e. that they don't have the same value).

这篇关于带有堆栈操作的GCC内联组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆