如何指示可以使用内联 ASM 参数*指向*的内存? [英] How can I indicate that the memory *pointed* to by an inline ASM argument may be used?

查看:27
本文介绍了如何指示可以使用内联 ASM 参数*指向*的内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下小函数:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
}

使用 gcc,编译为:

foo:
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

特别注意,第一次写入 iptr, iptr[10] = 1 根本不会发生:内联 asm nop是函数中的第一个,只出现2的最后写的(在ASM调用之后).显然,编译器决定它只需要提供 iptr 本身 值的最新版本,而不是它指向的内存.

Note in particular, that the first write to iptr, iptr[10] = 1 doesn't occur at all: the inline asm nop is the first thing in the function, and only the final write of 2 appears (after the ASM call). Apparently the compiler decides that it only needs to provide an up-to-date version of the value of iptr itself, but not the memory it points to.

我可以通过 memory 破坏器告诉编译器内存必须是最新的,就像这样:

I can tell the compiler that memory must be up to date with a memory clobber, like so:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):"memory");
    iptr[10] = 2;
}

结果是预期的代码:

foo:
        mov     DWORD PTR [rdi+40], 1
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

然而,这是一个太强的条件,因为它告诉编译器所有内存必须被写入.例如,在以下函数中:

However, this is too strong of a condition, since it tells the compiler all memory has to be written. For example, in the following function:

void foo2(int* iptr, long* lptr) {
    iptr[10] = 1;
    lptr[20] = 100;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
    lptr[20] = 200;
}

期望的行为是让编译器优化掉对 lptr[20] 的第一次写入,而不是对 iptr[10] 的第一次写入."memory" 破坏器无法实现这一点,因为这意味着必须进行两次写入:

The desired behavior is to let the compiler optimize away the first write to lptr[20], but not the first write to iptr[10]. The "memory" clobber cannot achieve this because it means both writes have to occur:

foo2:
        mov     DWORD PTR [rdi+40], 1
        mov     QWORD PTR [rsi+160], 100 ; lptr[10] written unecessarily
        nop
        mov     DWORD PTR [rdi+40], 2
        mov     QWORD PTR [rsi+160], 200
        ret

有没有办法告诉接受 gcc 扩展 asm 语法的编译器,asm 的输入包括指针和它可以指向的任何东西?

Is there some way to tell compilers accepting gcc extended asm syntax that the input to the asm includes the pointer and anything it can point to?

推荐答案

没错;要求一个指针作为内联 asm 的输入, 暗示所指向的内存也是一个输入或输出,或两者兼而有之.使用寄存器输入和寄存器输出,因为所有 gcc 都知道你的 asm 只是通过屏蔽低位来对齐指针,或者向它添加一个常量.(在这种情况下,您会希望优化一个死商店.)

That's correct; asking for a pointer as input to inline asm does not imply that the pointed-to memory is also an input or output or both. With a register input and register output, for all gcc knows your asm just aligns a pointer by masking off the low bits, or adds a constant to it. (In which case you would want it to optimize away a dead store.)

简单的选项是 asm volatile 和一个 "memory" clobber1.

The simple option is asm volatile and a "memory" clobber1.

您要求的更窄更具体的方法是使用虚拟"内存操作数以及寄存器中的指针.你的 asm 模板不引用这个操作数(除了可能在 asm 注释中查看编译器选择的内容).它告诉编译器您实际读取、写入或读取+写入的内存.

The narrower more specific way you're asking for is to use a "dummy" memory operand as well as the pointer in a register. Your asm template doesn't reference this operand (except maybe inside an asm comment to see what the compiler picked). It tells the compiler which memory you actually read, write, or read+write.

虚拟内存输入:"m" (*(const int (*)[]) iptr)
或输出:"=m" (*(int (*)[]) iptr).或者当然 "+m" 使用相同的语法.

Dummy memory input: "m" (*(const int (*)[]) iptr)
or output: "=m" (*(int (*)[]) iptr). Or of course "+m" with the same syntax.

该语法将转换为指向数组的指针并取消引用,因此实际输入是 C 数组.(如果您确实有一个数组,而不是指针,则不需要任何强制转换,只需将其作为内存操作数即可.)

That syntax is casting to a pointer-to-array and dereferencing, so the actual input is a C array. (If you actually have an array, not pointer, you don't need any casting and can just ask for it as a memory operand.)

如果您使用 [] 未指定大小,这会告诉 GCC 任何相对于该指针访问的内存都是输入、输出或输入/输出操作数.如果您使用 [10][some_variable],则会告诉编译器具体的大小.对于运行时变量大小,gcc 在实践中错过了 iptr[size+1] 不是输入的一部分的优化.

If you leave the size unspecified with [], that tells GCC that any memory accessed relative to that pointer is an input, output, or in/out operand. If you use [10] or [some_variable], that tells the compiler the specific size. With runtime-variable sizes, gcc in practice misses the optimization that iptr[size+1] is not part of the input.

GCC 对此进行了记录 因此支持它.如果数组元素类型与指针相同,或者如果它是 char,我认为这不是严格别名违规.

GCC documents this and therefore supports it. I think it's not a strict-aliasing violation if the array element type is the same as the pointer, or maybe if it's char.

(来自 GCC 手册)
一个 x86 示例,其中字符串内存参数的长度未知.

(from the GCC manual)
An x86 example where the string memory argument is of unknown length.

   asm("repne scasb"
    : "=c" (count), "+D" (p)
    : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));

如果您可以避免在指针输入操作数上使用早期破坏,则虚拟内存输入操作数通常会选择使用相同寄存器的简单寻址模式.

If you can avoid using an early-clobber on the pointer input operand, the dummy memory input operand will typically pick a simple addressing mode using that same register.

但是,如果您确实使用了早期的clobber 来确保 asm 循环的严格正确性,有时虚拟操作数会使 gcc 在内存操作数的基地址上浪费指令(和额外的寄存器).检查编译器的 asm 输出.

But if you do use an early-clobber for strict correctness of an asm loop, sometimes a dummy operand will make gcc waste instructions (and an extra register) on a base address for the memory operand. Check the asm output of the compiler.

这是 inline-asm 示例中一个普遍存在的错误,通常未被发现,因为 asm 被包装在一个函数中,该函数不会内联到任何调用者中,这些调用者会诱使编译器重新排序存储以进行合并以消除死存储.

This is a widespread bug in inline-asm examples which often goes undetected because the asm is wrapped in a function that doesn't inline into any callers that tempt the compiler into reordering stores for merging doing dead-store elimination.

GNU C 内联汇编语法旨在向编译器描述单个指令.其目的是通过 "m""=m" 操作数约束告诉编译器有关内存输入或内存输出的信息,然后编译器选择寻址模式.

GNU C inline asm syntax is designed around describing a single instruction to the compiler. The intent is that you tell the compiler about a memory input or memory output with a "m" or "=m" operand constraint, and it picks the addressing mode.

在内联 asm 中编写整个循环需要小心以确保编译器真的知道发生了什么(或 asm volatile 加上一个 "memory" clobber),否则你会有风险更改周围代码或启用允许跨文件内联的链接时优化时损坏.

Writing whole loops in inline asm requires care to make sure the compiler really knows what's going on (or asm volatile plus a "memory" clobber), otherwise you risk breakage when changing the surrounding code, or enabling link-time optimization that allows for cross-file inlining.

另见使用内联汇编循环数组以使用<代码>asm 语句作为循环 body,仍然在 C 中执行循环逻辑.使用实际(非虚拟)"m""=m" 操作数,编译器可以通过在它选择的寻址模式中使用位移来展开循环.

See also Looping over arrays with inline assembly for using an asm statement as the loop body, still doing the loop logic in C. With actual (non-dummy) "m" and "=m" operands, the compiler can unroll the loop by using displacements in the addressing modes it chooses.

脚注 1:"memory" 破坏使编译器将 asm 视为非内联函数调用(可以读取或写入任何内存,除了 逃逸分析 已证明没有逃逸).转义分析包括 asm 语句本身的输入操作数,还包括任何早期调用可能已将指针存储到其中的任何全局或静态变量.所以通常本地循环计数器不必围绕带有 "memory" 破坏符的 asm 语句溢出/重新加载.

Footnote 1: A "memory" clobber gets the compiler to treat the asm like a non-inline function call (that could read or write any memory except for locals that escape analysis has proved have not escaped). The escape analysis includes input operands to the asm statement itself, but also any global or static variables that any earlier call could have stored pointers into. So usually local loop counters don't have to be spilled/reloaded around an asm statement with a "memory" clobber.

asm volatile 是必要的,以确保即使其输出操作数未使用,也不会优化 asm(因为您需要未声明的写入内存的副作用发生).

asm volatile is necessary to make sure the asm isn't optimized away even if its output operands are unused (because you require the un-declared the side-effect of writing memory to happen).

或者对于只被asm读取的内存,如果同一个输入缓冲区包含不同的输入数据,你需要再次运行asm.如果没有 volatile,asm 语句可能会CSEd 脱离循环.(在考虑是否需要运行 asm 语句时,memory" 破坏器不会使优化器将所有内存视为输入.)

Or for memory that is only read by asm, you you need the asm to run again if the same input buffer contains different input data. Without volatile, the asm statement could be CSEd out of a loop. (A "memory" clobber does not make the optimizer treat all memory as an input when considering whether the asm statement even needs to run.)

asm 是隐式的 volatile,但最好让它显式.(GCC 手册有一节介绍了 asm volatile).

asm with no output operands is implicitly volatile, but it's a good idea to make it explicit. (The GCC manual has a section on asm volatile).

例如asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory") 有一个输出操作数因此不是隐式易变的.如果你喜欢它

e.g. asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory") has an output operand so is not implicitly volatile. If you used it like

 arr[5] = 1;
 total += asm_sum(arr, len);
 memcpy(arr, foo, len);
 total += asm_sum(arr, len);

如果没有 volatile,第二个 asm_sum 可以优化掉,假设具有相同输入操作数(指针和长度)的相同 asm 将产生相同的输出.对于任何不是其显式输入操作数的纯函数的 asm,您都需要 volatile.如果它没有优化掉,那么"memory" clobber 将具有要求内存同步的预期效果.

Without volatile the 2nd asm_sum could optimize away, assuming that the same asm with the same input operands (pointer and length) will produce the same output. You need volatile for any asm that's not a pure function of its explicit input operands. If it doesn't optimize away, then the "memory" clobber will have the desired effect of requiring memory to be in sync.

这篇关于如何指示可以使用内联 ASM 参数*指向*的内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆