对于asm语句中的临时寄存器,我应该使用clobber还是虚拟输出? [英] For temporary registers in the asm statement, should I use clobber or dummy output?

查看:113
本文介绍了对于asm语句中的临时寄存器,我应该使用clobber还是虚拟输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如该问题的标题中所述,当我出于临时原因修改asm语句中的某些寄存器时,在clobber和虚拟输出之间哪个选项更好?

As mentioned in the title of this question, when I modify some registers inside the asm statement, for a temporary reason, which option is better in between the clobber and dummy output?

例如,我在链接中实现了两个版本的交换函数,并发现其中两个版本会生成相同数量的输出指令.

For example, I implemented two versions of the exchange function in the link, and found that two versions generate the same amount of output instructions.

我应该使用哪个版本?我是否应该将其与虚拟输出一起使用,以允许编译器选择可以尽可能优化整个功能的寄存器?

Which version should I use? Should I use the one with the dummy output to allow the compiler choose the register that may optimize entire function as much as possible?

如果答案是肯定的,那我什么时候应该使用清除清单?当一条指令要求您将其操作数加载到特定寄存器时,是否可以使用Clobber列表?例如syscall指令,要求其参数应位于寄存器rdi rsi rdx r10 r8 r9?

If the answer is yes, then when should I use the clobber list? Is it only okay to use the clobber list when one instruction requires you to load its operand to the specific registers? such as syscall instruction requires its parameter should be located in register rdi rsi rdx r10 r8 r9??

推荐答案

您通常应该让编译器使用带有任何必需约束 1 的早期漏洞虚拟输出来为您选择寄存器.这样便可以灵活地为该功能进行寄存器分配.

You should normally let the compiler pick registers for you, using an early-clobber dummy output with any required constraints1. This gives it flexibility to do register allocation for the function.

1 例如您可以使用 +& Q 来获取具有AH/BH/CH/DH的RAX/RBX/RCX/RDX:寄存器之一.如果要使用 movzbl%h [input],%[high_byte]
解压缩8位字段; movzbl%b [input],%[low_byte] shr $ 16,%[input] ,您需要一个寄存器,该寄存器的第二个8位块别名为high-8寄存器.

1 e.g. you can use +&Q to get one of RAX/RBX/RCX/RDX: registers that have an AH/BH/CH/DH. If you wanted to unpack 8-bit fields with movzbl %h[input], %[high_byte]
; movzbl %b[input], %[low_byte] ; shr $16, %[input], you'd need a register that has it's 2nd 8-bit chunk aliased to a high-8 register.

出于好奇,当我们考虑amd64的调用约定时,可以在函数内部自由使用一些寄存器;例如,而且我们可以仅通过使用asm语句中的那些寄存器来实现某些功能.为什么允许编译器选择要使用的寄存器比上面提到的更好?

Out of curiosity, when we consider a calling convention of amd64, some registers can be freely used inside the functions; and we could implement some functions by only using those registers inside the asm statement. Why allowing the compiler to choose the registers to be used is better than the mentioned one?

由于函数可以内联,可能插入调用其他函数的循环中,因此编译器希望将其输入保存在调用保留的寄存器中.编译器总是必须调用,从内联asm而不是独立调用中得到的就是编译器处理调用约定差异和C ++名称处理.

Because functions can inline, maybe into a loop that calls other functions, thus the compiler would want to give it inputs in call-preserved registers. If you were writing a stand-alone function that the compiler always has to call, all you get from inline asm instead of stand-alone is the compiler handling calling-convention differences and C++ name-mangling.

或者周围的代码使用一些需要固定寄存器的指令,例如 cl 用于移位计数,或者RDX:RAX用于 div .

Or maybe the surrounding code uses some instructions that require fixed registers, like cl for shift counts or RDX:RAX for div.

我什么时候应该使用清单清单?...如syscall指令要求其参数应位于寄存器rdi rsi rdx r10 r8 r9 ??

when should I use the clobber list? ... such as syscall instruction requires its parameter should be located in register rdi rsi rdx r10 r8 r9??

通常,您将改用输入约束,因此,仅 syscall 指令本身位于内联汇编中.但是 syscall (指令本身)会掩盖RCX和R11,因此使用它进行的系统调用不可避免地会破坏用户空间的RCX和R11.除非使用返回地址(RCX)或RFLAGS(R11),否则将虚拟输出用于这些是没有意义的.所以是的,这里的杂物很有用.

Normally you'd use input constraints instead, so only the syscall instruction itself is inside the inline asm. But syscall (the instruction itself) clobbers RCX and R11, so system calls made using it unavoidably destroy user-space's RCX and R11. There's no point using dummy outputs for these, unless you have a use for the return address (RCX) or RFLAGS (R11). So yes, clobbers are useful here.

// the compiler will emit all the necessary MOV instructions
#include <stddef.h>
#include <asm/unistd.h>

// the compiler will emit all the necessary MOV instructions
//static inline 
size_t sys_write(int fd, const char *buf, size_t len) {
    size_t retval;
    asm volatile("syscall"
        : "=a"(retval)  //   EDI     RSI       RDX
        : "a"(__NR_write), "D"(fd), "S"(buf), "d"(len)
         , "m"(*(char (*)[len]) buf)   // dummy memory input: the asm statement reads this memory
        : "rcx", "r11"    // clobbered by syscall
           // , "memory"  // would be needed if we didn't use a dummy memory input
    );
    return retval;
}

此编译的非共线版本如下( <代码> GCC -O3 有关Godbolt编译探险),因为函数调用约定几乎与系统调用约定匹配:

A non-inline version of this compiles as follows (with gcc -O3 on the Godbolt compiler explorer), because the function-calling convention nearly matches the system-call convention:

sys_write(int, char const*, unsigned long):
    movl    $1, %eax
    syscall
    ret

在所有输入寄存器上使用Clobbers并将 mov 放在asm中真的很愚蠢:

It would have been really silly to use clobbers on any of the input registers and put a mov inside the asm:

size_t dumb_sys_write(int fd, const char *buf, size_t len) {
    size_t retval;
    asm volatile(
        "mov %[fd], %%edi\n\t"
        "mov %[buf], %%rsi\n\t"
        "mov %[len], %%rdx\n\t"
        "syscall"
        : "=a"(retval)  //   EDI     RSI       RDX
        : "a"(__NR_write), [fd]"r"(fd), [buf]"r"(buf), [len]"r"(len)
         , "m"(*(char (*)[len]) buf)   // dummy memory input: the asm statement reads this memory
        : "rdi", "rsi", "rdx", "rcx", "r11"
           // , "memory"  // would be needed if we didn't use a dummy memory input
    );

    // if(retval > -4096ULL) errno = -retval;

    return retval;
}

dumb_sys_write(int, char const*, unsigned long):
    movl    %edi, %r9d
    movq    %rsi, %r8
    movq    %rdx, %r10
    movl    $1, %eax     # compiler generated before this
  # from inline asm
    mov %r9d, %edi
    mov %r8, %rsi
    mov %r10, %rdx
    syscall
  # end of inline asm
    ret

此外,您不会让编译器利用 syscall 不会破坏其任何输入寄存器这一事实.编译器可能仍然希望在寄存器中使用 len ,并且使用纯输入约束条件使它知道之后该值仍然存在.

And besides that, you're not letting the compiler take advantage of the fact that syscall doesn't clobber any of its input registers. The compiler might well still want len in a register, and using a pure input constraint lets it know that the value will still be there afterwards.

如果您正在使用任何隐式使用某些寄存器的指令,也可能会使用clobbers,但是这些指令的输入或输出都不是asm语句的直接输入或输出.但是,除非您用内联asm编写整个循环或大量代码,否则这种情况很少见.

You might also use clobbers if you're using any instructions that implicitly use certain registers, but neither the input nor output of those instructions is a direct input or output of the asm statement. That would be rare, though, unless you're writing a whole loop or large block of code in inline asm.

或者,如果您要包装 call 指令,也可以.(很难安全地执行此操作,尤其是由于存在红色区域,但是人们确实会尝试执行此操作).您不必选择要注册哪个代码伪造者,因此只需将其告知编译器即可.

Or maybe if you're wrapping a call instruction. (It's hard to do this safely, especially because of the red-zone, but people do try to do this). You don't get to choose which registers the code clobbers, so you just tell the compiler about it.

这篇关于对于asm语句中的临时寄存器,我应该使用clobber还是虚拟输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆