如何指示可以使用内联ASM参数“指向" *的内存? [英] How can I indicate that the memory *pointed* to by an inline ASM argument may be used?

查看:60
本文介绍了如何指示可以使用内联ASM参数“指向" *的内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下小功能:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
}

使用gcc,这将编译为:

foo:
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

尤其要注意,对iptriptr[10] = 1的第一次写入根本不会发生:内联asm nop是函数中的第一件事,并且仅对2的最终写入(在ASM调用之后).显然,编译器决定只需要提供iptr 本身值的最新版本,而无需提供它指向的内存.

我可以告诉编译器,内存必须使用memory缓冲区更新,就像这样:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):"memory");
    iptr[10] = 2;
}

这将产生预期的代码:

foo:
        mov     DWORD PTR [rdi+40], 1
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

但是,这是条件的太强,因为它告诉编译器所有内存必须被写入.例如,在以下功能中:

void foo2(int* iptr, long* lptr) {
    iptr[10] = 1;
    lptr[20] = 100;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
    lptr[20] = 200;
}

所需的行为是让编译器优化对lptr[20]的第一次写入,而不是对iptr[10]的第一次写入. "memory"破坏者无法实现这一目标,因为这意味着两次写入都必须发生:

foo2:
        mov     DWORD PTR [rdi+40], 1
        mov     QWORD PTR [rsi+160], 100 ; lptr[10] written unecessarily
        nop
        mov     DWORD PTR [rdi+40], 2
        mov     QWORD PTR [rsi+160], 200
        ret

是否可以通过某种方式告诉编译器接受gcc扩展的asm语法,即asm的输入包括指针及其可以指向的任何内容?

解决方案

是正确的;要求将指针作为内联asm的输入不是 not ,这并不意味着指向的内存也是输入或输出,或两者兼而有之.通过寄存器输入和寄存器输出,对于所有gcc来说,您都知道,asm只是通过掩盖低位来对齐指针,或者向其添加常数. (在这种情况下,您会想要将其优化以消除无效存储.)

简单的选项是asm volatile"memory" clobber 1 .

您要使用的更窄更具体的方法是使用虚拟"内存操作数以及寄存器中的指针.您的asm模板不会引用此操作数(除非可能在asm注释中以查看编译器选择了什么).它告诉编译器您实际上读,写或读+写的内存.

虚拟内存输入: "m" (*(const int (*)[]) iptr)
或输出:"=m" (*(int (*)[]) iptr).或者当然"+m"具有相同的语法.

该语法强制转换为指向数组的指针并取消引用,因此实际输入为C array . (如果您实际上有一个数组,而不是指针,则不需要任何强制转换,只需将其用作内存操作数即可.)

如果未使用[]指定大小,则告诉GCC相对于该指针访问的任何内存都是输入,输出或输入/输出操作数.如果使用[10][some_variable],告诉编译器特定的大小.对于运行时可变大小,gcc实际上错过了iptr[size+1]不是输入的 部分的优化.

GCC对此进行了记录,因此支持它.我认为,如果数组元素类型与指针相同,或者如果它是char,则不是严格混叠违规.

(来自GCC手册)
一个x86示例,其中字符串内存参数的长度未知.

   asm("repne scasb"
    : "=c" (count), "+D" (p)
    : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));

如果可以避免在指针输入操作数上使用早期指针,则虚拟内存输入操作数通常将使用同一寄存器来选择简单的寻址模式.

但是,如果您确实使用Early-Clobber来严格确保asm循环的正确性,则有时虚操作数将使gcc浪费指令(以及额外的寄存器)到内存操作数的基址上.检查编译器的asm 输出.


背景:

这是inline-asm示例中的一个普遍存在的错误,通常不会被发现,因为asm封装在一个函数中,该函数不会内联到任何调用程序中,这些调用程序会诱使编译器重新排序存储以合并以进行死存储消除. >

GNU C内联汇编语法是围绕向编译器描述指令而设计的.目的是告诉编译器有关具有"m""=m"操作数约束的内存输入或内存输出,并选择寻址方式.

在嵌入式asm中编写整个循环需要格外小心,以确保编译器确实知道发生了什么事情(或asm volatile加上"memory" clobber),否则您在更改周围的代码或启用链接时优化时可能会损坏允许跨文件内联.

另请参见使用内联汇编在数组上循环以使用语句作为循环 body ,仍然在C语言中执行循环逻辑.使用实际的(非虚拟)"m""=m"操作数,编译器可以通过在寻址中使用位移来展开循环选择的模式.


脚注1:"memory" Clobber使编译器将asm视为非内联函数调用(可以读取或写入任何内存,但指针的asm语句周围溢出/重新加载本地循环计数器.

asm volatile是必要的,以确保即使未使用asm的输出操作数也不会对其进行优化(因为您需要进行未声明的写内存副作用).

或者对于仅由asm读取的内存,如果相同的输入缓冲区包含不同的输入数据,则需要再次运行asm.如果不使用volatile,则asm语句可能会 CSEd 处于循环之外. (在考虑是否甚至需要运行asm语句时,"memory" Clobber不会 使优化器将所有内存都视为输入).

没有输出操作数的

asm隐式地是volatile,但是将其显式是一个好主意. (GCC手册在 asm volatile 中有一节).

例如asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory")具有输出操作数,因此不是隐式易失的.如果您喜欢这样

 arr[5] = 1;
 total += asm_sum(arr, len);
 memcpy(arr, foo, len);
 total += asm_sum(arr, len);

在没有volatile的情况下,第二个asm_sum可以进行优化,假设具有相同输入操作数(指针和长度)的相同asm将产生相同的输出.对于任何不是其显式输入操作数的纯函数的asm,您都需要volatile.如果无法优化,则 then "memory"破坏器将具有要求内存同步的预期效果.

Consider the following small function:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
}

Using gcc, this compiles to:

foo:
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

Note in particular, that the first write to iptr, iptr[10] = 1 doesn't occur at all: the inline asm nop is the first thing in the function, and only the final write of 2 appears (after the ASM call). Apparently the compiler decides that it only needs to provide an up-to-date version of the value of iptr itself, but not the memory it points to.

I can tell the compiler that memory must be up to date with a memory clobber, like so:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):"memory");
    iptr[10] = 2;
}

which results in the expected code:

foo:
        mov     DWORD PTR [rdi+40], 1
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

However, this is too strong of a condition, since it tells the compiler all memory has to be written. For example, in the following function:

void foo2(int* iptr, long* lptr) {
    iptr[10] = 1;
    lptr[20] = 100;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
    lptr[20] = 200;
}

The desired behavior is to let the compiler optimize away the first write to lptr[20], but not the first write to iptr[10]. The "memory" clobber cannot achieve this because it means both writes have to occur:

foo2:
        mov     DWORD PTR [rdi+40], 1
        mov     QWORD PTR [rsi+160], 100 ; lptr[10] written unecessarily
        nop
        mov     DWORD PTR [rdi+40], 2
        mov     QWORD PTR [rsi+160], 200
        ret

Is there some way to tell compilers accepting gcc extended asm syntax that the input to the asm includes the pointer and anything it can point to?

解决方案

That's correct; asking for a pointer as input to inline asm does not imply that the pointed-to memory is also an input or output or both. With a register input and register output, for all gcc knows your asm just aligns a pointer by masking off the low bits, or adds a constant to it. (In which case you would want it to optimize away a dead store.)

The simple option is asm volatile and a "memory" clobber1.

The narrower more specific way you're asking for is to use a "dummy" memory operand as well as the pointer in a register. Your asm template doesn't reference this operand (except maybe inside an asm comment to see what the compiler picked). It tells the compiler which memory you actually read, write, or read+write.

Dummy memory input: "m" (*(const int (*)[]) iptr)
or output: "=m" (*(int (*)[]) iptr). Or of course "+m" with the same syntax.

That syntax is casting to a pointer-to-array and dereferencing, so the actual input is a C array. (If you actually have an array, not pointer, you don't need any casting and can just ask for it as a memory operand.)

If you leave the size unspecified with [], that tells GCC that any memory accessed relative to that pointer is an input, output, or in/out operand. If you use [10] or [some_variable], that tells the compiler the specific size. With runtime-variable sizes, gcc in practice misses the optimization that iptr[size+1] is not part of the input.

GCC documents this and therefore supports it. I think it's not a strict-aliasing violation if the array element type is the same as the pointer, or maybe if it's char.

(from the GCC manual)
An x86 example where the string memory argument is of unknown length.

   asm("repne scasb"
    : "=c" (count), "+D" (p)
    : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));

If you can avoid using an early-clobber on the pointer input operand, the dummy memory input operand will typically pick a simple addressing mode using that same register.

But if you do use an early-clobber for strict correctness of an asm loop, sometimes a dummy operand will make gcc waste instructions (and an extra register) on a base address for the memory operand. Check the asm output of the compiler.


Background:

This is a widespread bug in inline-asm examples which often goes undetected because the asm is wrapped in a function that doesn't inline into any callers that tempt the compiler into reordering stores for merging doing dead-store elimination.

GNU C inline asm syntax is designed around describing a single instruction to the compiler. The intent is that you tell the compiler about a memory input or memory output with a "m" or "=m" operand constraint, and it picks the addressing mode.

Writing whole loops in inline asm requires care to make sure the compiler really knows what's going on (or asm volatile plus a "memory" clobber), otherwise you risk breakage when changing the surrounding code, or enabling link-time optimization that allows for cross-file inlining.

See also Looping over arrays with inline assembly for using an asm statement as the loop body, still doing the loop logic in C. With actual (non-dummy) "m" and "=m" operands, the compiler can unroll the loop by using displacements in the addressing modes it chooses.


Footnote 1: A "memory" clobber gets the compiler to treat the asm like a non-inline function call (that could read or write any memory except for locals that escape analysis has proved have not escaped). The escape analysis includes input operands to the asm statement itself, but also any global or static variables that any earlier call could have stored pointers into. So usually local loop counters don't have to be spilled/reloaded around an asm statement with a "memory" clobber.

asm volatile is necessary to make sure the asm isn't optimized away even if its output operands are unused (because you require the un-declared the side-effect of writing memory to happen).

Or for memory that is only read by asm, you you need the asm to run again if the same input buffer contains different input data. Without volatile, the asm statement could be CSEd out of a loop. (A "memory" clobber does not make the optimizer treat all memory as an input when considering whether the asm statement even needs to run.)

asm with no output operands is implicitly volatile, but it's a good idea to make it explicit. (The GCC manual has a section on asm volatile).

e.g. asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory") has an output operand so is not implicitly volatile. If you used it like

 arr[5] = 1;
 total += asm_sum(arr, len);
 memcpy(arr, foo, len);
 total += asm_sum(arr, len);

Without volatile the 2nd asm_sum could optimize away, assuming that the same asm with the same input operands (pointer and length) will produce the same output. You need volatile for any asm that's not a pure function of its explicit input operands. If it doesn't optimize away, then the "memory" clobber will have the desired effect of requiring memory to be in sync.

这篇关于如何指示可以使用内联ASM参数“指向" *的内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆