什么阻止使用函数参数作为隐藏指针? [英] What prevents the usage of a function argument as hidden pointer?

查看:24
本文介绍了什么阻止使用函数参数作为隐藏指针?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图理解 System V AMD64 的含义 -ABI 的 调用约定并查看以下示例:

struct Vec3{双 x, y, z;};struct Vec3 do_something(void);void use(struct Vec3 * out){*out = do_something();}

Vec3 变量是 MEMORY 类型,因此调用者(use)必须为返回的变量分配空间并将其作为隐藏指针传递给被调用者(即do_something).这就是我们在生成的汇编器中看到的(on Godbolt,使用 -O2 编译代码>):

使用:pushq %rbxmovq %rdi, %rbx ;记住subq $32, %rsp ;返回对象的内存movq %rsp, %rdi ;隐藏指向%rdi的指针调用 do_somethingmovdqu (%rsp), %xmm0 ;复制内存到outmovq 16(%rsp), %rax移动 %xmm0, (%rbx)movq %rax, 16(%rbx)addq $32, %rsp ;展开/恢复popq %rbx退

我知道,指针out 的别名(例如作为全局变量)可以在do_something 中使用,因此out 不能被传递作为指向 do_something 的隐藏指针:如果可以,out 将在 do_something 内部更改,而不是在 do_something 返回时更改,因此某些计算可能会出错.例如,这个版本的 do_something 会返回错误的结果:

struct Vec3 global;//在某处初始化struct Vec3 do_something(void){struct Vec3 res;res.x = 2*global.x;res.y = global.y+global.x;res.z = 0;返回资源;}

if out 其中全局变量 global 的别名被用作在 %rdi 中传递的隐藏指针,res 也是 global 的别名,因为编译器会直接使用隐藏指针指向的内存(C 中的一种 RVO),而不需要实际创建临时对象并在返回时复制它,则 res.y 将是 2*x+y(如果 x,yglobal 的旧值) 而不是 x+y 作为任何其他隐藏指针.

有人向我建议,使用 restrict 应该可以解决问题,即

void use(struct Vec3 *restrict out){*out = do_something();}

因为现在编译器知道,没有可以在 do_something 中使用的 out 的别名,所以汇编程序可以像这样简单:

使用:jmp do_something ;%rdi 现在是隐藏的指针

然而,gcc 和 clang 都不是这种情况——汇编器保持不变(参见 godbolt).

什么阻止了 out 作为隐藏指针的使用?

<小时>

注意:对于稍微不同的函数签名,将实现所需的(或非常相似的)行为:

struct Vec3 use_v2(){返回 do_something();}

结果(见godbolt):

use_v2:pushq %r12movq %rdi, %r12调用 do_somethingmovq %r12, %raxpopq %r12退

解决方案

允许函数假定其返回值对象(由隐藏指针指向)与 anything 别的.即它的输出指针(作为隐藏的第一个参数传递)没有任何别名.

您可以将其视为具有隐式 restrict 的隐藏的第一个 arg 输出指针.(因为在C抽象机中,返回值是一个单独的对象,x86-64 System V规定调用方提供空间.x86-64 SysV没有给调用方许可引入别名.)

使用其他私有的本地作为目的地(而不是单独的专用空间,然后复制到真正的本地)很好,但不能使用可能指向其他方式可到达的东西的指针.这需要进行转义分析,以确保没有将指向此类本地的指针传递到函数之外.

我认为 x86-64 SysV 调用约定通过让调用者提供一个真正的返回值对象来模拟 C 抽象机,而不是强迫 被调用者发明如果需要确保所有对 retval 的写入都发生在任何其他写入之后,则为临时.这不是调用者为返回值提供空间"的意思,IMO.

这绝对是 GCC 和其他编译器在实践中如何解释它的方式,这在这么长时间的调用约定中很重要(从第一个 AMD64 芯片之前的一两年开始,所以很早 2000 年代).

<小时>

以下是优化完成后会中断的情况:

struct Vec3{双 x, y, z;};struct Vec3 glob3;__attribute__((noinline))struct Vec3 do_something(void) {//按某种顺序将 glob3 复制到 retval返回 (struct Vec3){glob3.y, glob3.z, glob3.x};}__attribute__((noinline))void use(struct Vec3 * out){//将 do_something() 结果复制到 *out*out = do_something();}无效调用者(无效){使用(&glob3);}

根据您建议的优化,do_something 的输出对象将是 glob3.但它也读取glob3.

do_something 的有效实现是将元素从 glob3 按源顺序复制到 (%rdi),这将执行 glob3.x = glob3.y 在读取 glob3.x 作为返回值的第三个元素之前.

这实际上正是gcc -O1所做的(Godbolt 编译器浏览器)

do_something:movq %rdi, %rax # tmp90, .result_ptrmovsd glob3+8(%rip), %xmm0 # glob3.y, glob3.ymovsd %xmm0, (%rdi) # glob3.y, <retval>.xmovsd glob3+16(%rip), %xmm0 # glob3.z, _2movsd %xmm0, 8(%rdi) # _2, <retval>.ymovsd glob3(%rip), %xmm0 # glob3.x, _3movsd %xmm0, 16(%rdi) # _3, <retval>.z退

在加载 glob3.x 之前注意 glob3.y, .x 存储.

因此,如果源代码中的任何地方都没有 restrict,GCC 已经为 do_something 发出 asm,假定 retval 和 glob3 之间没有别名.

<小时>

我不认为使用 struct Vec3 *restrict out 根本没有帮助:这只告诉编译器在 use() 里面你不会访问*out 对象通过任何其他名称.由于 use() 不引用 glob3,因此将 &glob3 作为参数传递给 restrict use 的版本.

我可能错了;@M.M 在评论中认为 *restrict out 可能使这种优化安全,因为 do_something() 的执行发生在 out() 期间.(编译器实际上仍然没有这样做,但也许他们会被允许 restrict 指针.)

更新:Richard Biener 说GCC Missing-optimization bug-report that MM 是正确的,如果编译器能证明函数正常返回(不是异常或longjmp),优化理论上是合法的(但仍然不是GCC可能会寻找):

<块引用><块引用>

如果是这样,如果我们能证明,restrict 将使这个优化安全do_something 是noexcept"并且不会 longjmp.

是的.

有一个 noexecpt 声明,但没有(AFAIK)一个 nolongjmp 声明可以放在原型上.

所以这意味着只有当我们可以看到另一个函数的主体时,它才有可能(即使在理论上)作为过程间优化.除非 noexcept 也意味着没有 longjmp.

I try to understand the implication of System V AMD64 - ABI's calling convention and looking at the following example:

struct Vec3{
    double x, y, z;
};

struct Vec3 do_something(void);

void use(struct Vec3 * out){
    *out = do_something();
}

A Vec3-variable is of type MEMORY and thus the caller (use) must allocate space for the returned variable and pass it as hidden pointer to the callee (i.e. do_something). Which is what we see in the resulting assembler (on godbolt, compiled with -O2):

use:
        pushq   %rbx
        movq    %rdi, %rbx           ;remember out
        subq    $32, %rsp            ;memory for returned object
        movq    %rsp, %rdi           ;hidden pointer to %rdi
        call    do_something
        movdqu  (%rsp), %xmm0        ;copy memory to out
        movq    16(%rsp), %rax
        movups  %xmm0, (%rbx)
        movq    %rax, 16(%rbx)
        addq    $32, %rsp            ;unwind/restore
        popq    %rbx
        ret

I understand, that an alias of pointer out (e.g. as global variable) could be used in do_something and thus out cannot be passed as hidden pointer to do_something: if it would, out would be changed inside of do_something and not when do_something returns, thus some calculations might become faulty. For example this version of do_something would return faulty results:

struct Vec3 global; //initialized somewhere
struct Vec3 do_something(void){
   struct Vec3 res;
   res.x = 2*global.x; 
   res.y = global.y+global.x; 
   res.z = 0; 
   return res;
}

if out where an alias for the global variable global and were used as hidden pointer passed in %rdi, res were also an alias of global, because the compiler would use the memory pointed to by hidden pointer directly (a kind of RVO in C), without actually creating a temporary object and copying it when returned, then res.y would be 2*x+y(if x,y are old values of global) and not x+y as for any other hidden pointer.

It was suggested to me, that using restrict should solve the problem, i.e.

void use(struct Vec3 *restrict out){
    *out = do_something();
}

because now, the compiler knows, that there are no aliases of out which could be used in do_something, so the assembler could be as simple as this:

use:
    jmp     do_something ; %rdi is now the hidden pointer

However, this is not the case neither for gcc nor for clang - the assembler stays unchanged (see on godbolt).

What prevents the usage of out as hidden pointer?


NB: The desired (or very similar) behavior would be achieved for a slightly different function-signature:

struct Vec3 use_v2(){
    return do_something();
}

which results in (see on godbolt):

use_v2:
    pushq   %r12
    movq    %rdi, %r12
    call    do_something
    movq    %r12, %rax
    popq    %r12
    ret

解决方案

A function is allowed to assume its return-value object (pointed-to by a hidden pointer) is not the same object as anything else. i.e. that its output pointer (passed as a hidden first arg) doesn't alias anything.

You could think of this as the hidden first arg output pointer having an implicit restrict on it. (Because in the C abstract machine, the return value is a separate object, and the x86-64 System V specifies that the caller provides space. x86-64 SysV doesn't give the caller license to introduce aliasing.)

Using an otherwise-private local as the destination (instead of separate dedicated space and then copying to a real local) is fine, but pointers that may point to something reachable another way must not be used. This requires escape analysis to make sure that a pointer to such a local hasn't been passed outside of the function.

I think the x86-64 SysV calling convention models the C abstract machine here by having the caller provide a real return-value object, not forcing the callee to invent that temporary if needed to make sure all the writes to the retval happened after any other writes. That's not what "the caller provides space for the return value" means, IMO.

That's definitely how GCC and other compilers interpret it in practice, which is a big part of what matters in a calling convention that's been around this long (since a year or two before the first AMD64 silicon, so very early 2000s).


Here's a case where your optimization would break if it were done:

struct Vec3{
    double x, y, z;
};
struct Vec3 glob3;

__attribute__((noinline))
struct Vec3 do_something(void) {  // copy glob3 to retval in some order
    return (struct Vec3){glob3.y, glob3.z, glob3.x};
}

__attribute__((noinline))
void use(struct Vec3 * out){   // copy do_something() result to *out
    *out = do_something();
}


void caller(void) {
    use(&glob3);
}

With the optimization you're suggesting, do_something's output object would be glob3. But it also reads glob3.

A valid implementation for do_something would be to copy elements from glob3 to (%rdi) in source order, which would do glob3.x = glob3.y before reading glob3.x as the 3rd element of the return value.

That is in fact exactly what gcc -O1 does (Godbolt compiler explorer)

do_something:
    movq    %rdi, %rax               # tmp90, .result_ptr
    movsd   glob3+8(%rip), %xmm0      # glob3.y, glob3.y
    movsd   %xmm0, (%rdi)             # glob3.y, <retval>.x
    movsd   glob3+16(%rip), %xmm0     # glob3.z, _2
    movsd   %xmm0, 8(%rdi)            # _2, <retval>.y
    movsd   glob3(%rip), %xmm0        # glob3.x, _3
    movsd   %xmm0, 16(%rdi)           # _3, <retval>.z
    ret     

Notice the glob3.y, <retval>.x store before the load of glob3.x.

So without restrict anywhere in the source, GCC already emits asm for do_something that assumes no aliasing between the retval and glob3.


I don't think using struct Vec3 *restrict out wouldn't help at all: that only tells the compiler that inside use() you won't access the *out object through any other name. Since use() doesn't reference glob3, it's not UB to pass &glob3 as an arg to a restrict version of use.

I may be wrong here; @M.M argues in comments that *restrict out might make this optimization safe because the execution of do_something() happens during out(). (Compilers still don't actually do it, but maybe they would be allowed to for restrict pointers.)

Update: Richard Biener said in the GCC missed-optimization bug-report that M.M is correct, and if the compiler can prove that the function returns normally (not exception or longjmp), the optimization is legal in theory (but still not something GCC is likely to look for):

If so, restrict would make this optimization safe if we can prove that do_something is "noexcept" and doesn't longjmp.

Yes.

There's a noexecpt declaration, but there isn't (AFAIK) a nolongjmp declaration you can put on a prototype.

So that means it's only possible (even in theory) as an inter-procedural optimization when we can see the other function's body. Unless noexcept also means no longjmp.

这篇关于什么阻止使用函数参数作为隐藏指针?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆