为什么我的& == r&(var)输出没有选择与& quot; a& quot(var)输入相同的寄存器? [英] Why does my "=r"(var) output not pick the same register as "a"(var) input?

查看:71
本文介绍了为什么我的& == r&(var)输出没有选择与& quot; a& quot(var)输入相同的寄存器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习如何使用 __ asm__ volatile 在GCC中提出了一个问题.我想实现一个执行原子比较和交换并返回先前存储在目标中的值的函数.

I'm learning how to use __asm__ volatile in GCC and came up with a problem. I want implement a function performing atomic compare and exchange and returning the value that was previously stored in the destination.

为什么"= a"(预期)输出约束起作用,但是"= r"(预期)约束使得编译器生成的代码不能工作吗?

Why does an "=a"(expected) output constraint work, but an "=r"(expected) constraint lets the compiler generate code that doesn't work?

案例1.

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>

uint64_t atomic_cas(uint64_t * destination, uint64_t expected, uint64_t value){
    __asm__ volatile (
        "lock cmpxchgq %3, %1":
        "=a" (expected) :
        "m" (*destination), "a" (expected), "r" (value) :
        "memory"
    );

    return expected;
}

int main(void){
    uint64_t v1 = 10;
    uint64_t result = atomic_cas(&v1, 10, 5);
    printf("%" PRIu64 "\n", result);           //prints 10, the value before, OK
    printf("%" PRIu64 "\n", v1);               //prints 5, the new value, OK
}

它按预期工作.现在考虑以下情况:

It works as expected. Now consider the following case:

案例2.

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>

uint64_t atomic_cas(uint64_t * destination, uint64_t expected, uint64_t value){
    __asm__ volatile (
        "lock cmpxchgq %3, %1":
        "=r" (expected) ://<----- I changed a with r and expected GCC understood it from the inputs 
        "m" (*destination), "a" (expected), "r" (value) :
        "memory"
    );

    return expected;
}

int main(void){
    uint64_t v1 = 10;
    uint64_t result = atomic_cas(&v1, 10, 5);
    printf("%" PRIu64 "\n", result);            //prints 5, wrong
    printf("%" PRIu64 "\n", v1);                //prints 5, the new value, OK 
}

我检查了生成的程序集并注意到以下内容:

I examined generated assembly and noticed the following things:

I.在这两种情况下,功能代码都是相同的,看起来像

I. In both of the cases the function code is the same and looks as

   0x0000555555554760 <+0>:     mov    rax,rsi
   0x0000555555554763 <+3>:     lock cmpxchg QWORD PTR [rdi],rdx
   0x0000555555554768 <+8>:     ret 

II.当GCC内嵌 atomic_cas 时出现了问题,因此在以后的情况下,正确的值没有传递给 printf 函数.这是 disas main 的相关片段:

II. The problem came when GCC inlined the atomic_cas so in the later case the correct value was not passed to the printf function. Here is the related fragment of disas main:

0x00000000000005f6 <+38>:    lock cmpxchg QWORD PTR [rsp],rdx
0x00000000000005fc <+44>:    lea    rsi,[rip+0x1f1]        # 0x7f4
0x0000000000000603 <+51>:    mov    rdx,rax ;  <-----This instruction is absent in the Case 2.
0x0000000000000606 <+54>:    mov    edi,0x1
0x000000000000060b <+59>:    xor    eax,eax

问题: 为什么用任意寄存器( r)产生错误的结果?我希望这在两种情况下都能奏效?

QUESTION: Why does the replacing rax(a) with an arbitrary register (r) produce wrong result? I expected it worked in both of the cases?

UPD.我用以下标志编译 -Wl,-z,lazy -Warray-bounds -Wextra -Wall -g3 -O3

UPD. I compile with the following flags -Wl,-z,lazy -Warray-bounds -Wextra -Wall -g3 -O3

推荐答案

首先, https://gcc.gnu.org/wiki/DontUseInlineAsm .与使用 bool __atomic_compare_exchange(类型* ptr,类型*预期,类型*所需,布尔值弱,int success_memorder,int failure_memorder) https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html .即使在非 _Atomic 变量上也可以使用.

First of all, https://gcc.gnu.org/wiki/DontUseInlineAsm. There is basically zero reason to roll your own CAS, vs. using bool __atomic_compare_exchange(type *ptr, type *expected, type *desired, bool weak, int success_memorder, int failure_memorder) https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. This works even on non-_Atomic variables.

"= r" 告诉gcc可以在所需的任何寄存器中请求输出,因此可以避免必须将结果本身 mov .(就像这里的GCC希望RSI中的输出作为printf的arg).和/或因此,它可以避免破坏输入到同一寄存器中的输入.这就是 = r 的重点,而不是特定寄存器的约束.

"=r" tells gcc it can ask for the output in whatever register it wants, so it can avoid having to mov the result there itself. (Like here where GCC wants the output in RSI as an arg for printf). And/or so it can avoid destroying the input it put in the same register. That's the entire point of =r instead of specific-register constraints.

如果要告诉GCC它选择用于输入的寄存器也是输出寄存器,请使用"+ r" .或者在这种情况下,因为您需要它来选择RAX,请使用"+ a"(预期).

If you want to tell GCC that the register it picks for input is also the output register, use "+r". Or in this case since you need it to pick RAX, use "+a"(expected).

已经有语法使编译器为2个约束选择相同的寄存器,并为输入和输出使用单独的变量,特别是匹配约束:"= r"(outvar):"0"(invar)

There's already syntax for making the compiler pick the same register for 2 constraints with separate variables for input and output, specifically matching constraints: "=r"(outvar) : "0"(invar).

如果语法不是让您描述一种无损指令,该指令可能在与输入不同的寄存器中产生输出,那将是一个错过的优化.

It would be a missed optimization if the syntax didn't let you describe a non-destructive instruction that could produce output in a different register from the input(s).

您可以通过在注释中使用约束来查看GCC实际选择了什么.

请记住,GNU C内联汇编只是将文本替换到您的模板中.编译器从字面上不知道asm指令做什么,甚至不检查它们是否有效.(只有在汇编器读取编译器输出时才会发生这种情况.)

Remember that GNU C inline asm is just text substitution into your template. The compiler literally has no idea what the asm instructions do, and doesn't even check they're valid. (That only happens when the assembler reads the compiler output).

    ...
    asm volatile (
    "lock cmpxchgq %3, %1   # 0 out: %0  |  2 in: %2" 
    : ...
    ...

生成的asm非常清楚地显示了问题(

The resulting asm shows the problem very clearly (Godbolt GCC7.4):

        lock cmpxchgq %rsi, (%rsp)   # 0 out: %rsi  |  2 in: %rax
        leaq    .LC0(%rip), %rdi
        xorl    %eax, %eax
        call    printf@PLT

(我使用了AT& T语法,因此您的 cmpxchgq%reg,mem 将与 mem,reg 操作数顺序

(I used AT&T syntax so your cmpxchgq %reg,mem would match the mem,reg operand order documented by Intel, although both GAS and clang's built-in assembler seem to accept it in the other order, too. Also because of the operand-size suffix)

GCC借此机会要求RSI中的"= r"(预期)输出作为printf的arg.您的错误是您的模板错误地假设%0 将扩展为 rax .

GCC takes the opportunity to ask for the "=r"(expected) output in RSI as an arg for printf. Your bug is that your template makes a wrong assumption that %0 will expand to rax.

有很多例子表明,使用相同的C var时,输入和输出之间缺乏隐式连接.例如,您可以仅使用约束就用一个空的asm语句交换2个C变量.如何编写一小段内联gnu扩展程序集来交换两个整数变量的值?

There are lots of examples of the lack of implicit connection between input and output that happen to use the same C var. For example, you can swap 2 C variables with an empty asm statement, just using constraints. How to write a short block of inline gnu extended assembly to swap the values of two integer variables?

这篇关于为什么我的&amp; == r&amp;(var)输出没有选择与&amp; quot; a&amp; quot(var)输入相同的寄存器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆