是什么阻止了将函数参数用作隐藏指针? [英] What prevents the usage of a function argument as hidden pointer?

查看:94
本文介绍了是什么阻止了将函数参数用作隐藏指针?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了解 System V AMD64的含义- ABI的 调用约定并查看以下示例:

struct Vec3{
    double x, y, z;
};

struct Vec3 do_something(void);

void use(struct Vec3 * out){
    *out = do_something();
}

Vec3变量的类型为MEMORY,因此调用方(use)必须为返回的变量分配空间,并将其作为隐藏指针传递给被调用方(即do_something).这是我们在生成的汇编器中看到的内容(在-O2上编译的 上的):

use:
        pushq   %rbx
        movq    %rdi, %rbx           ;remember out
        subq    $32, %rsp            ;memory for returned object
        movq    %rsp, %rdi           ;hidden pointer to %rdi
        call    do_something
        movdqu  (%rsp), %xmm0        ;copy memory to out
        movq    16(%rsp), %rax
        movups  %xmm0, (%rbx)
        movq    %rax, 16(%rbx)
        addq    $32, %rsp            ;unwind/restore
        popq    %rbx
        ret

我知道,指针out的别名(例如,用作全局变量)可以在do_something中使用,因此out不能作为隐藏的指针传递给do_something:如果可以的话,out将在do_something内部进行更改,而不是在do_something返回时进行更改,因此某些计算可能会出错.例如,此版本的do_something会返回错误的结果:

struct Vec3 global; //initialized somewhere
struct Vec3 do_something(void){
   struct Vec3 res;
   res.x = 2*global.x; 
   res.y = global.y+global.x; 
   res.z = 0; 
   return res;
}

如果out用作全局变量global的别名,并用作在%rdi中传递的隐藏指针,则res也是global的别名,因为编译器将使用指向的内存直接由隐藏的指针(C中的一种RVO)创建,而无需实际创建临时对象并在返回时将其复制,则res.y将是2*x+y(如果x,yglobal的旧值),而不是x+y以及其他任何隐藏的指针.

有人建议我,使用restrict应该可以解决问题,即

void use(struct Vec3 *restrict out){
    *out = do_something();
}

因为编译器知道,现在do_something中没有可以使用的out别名,所以汇编程序可以像这样简单:

use:
    jmp     do_something ; %rdi is now the hidden pointer

但是,无论是gcc还是clang都不是这种情况-汇编程序保持不变(请参见 godbolt ).

什么阻止了将out用作隐藏指针?


注意:对于稍微不同的功能签名,可以实现所需的(或非常相似的)行为:

struct Vec3 use_v2(){
    return do_something();
}

结果(请参见 godbolt ):

use_v2:
    pushq   %r12
    movq    %rdi, %r12
    call    do_something
    movq    %r12, %rax
    popq    %r12
    ret

解决方案

允许函数假定其返回值对象(由隐藏指针指向)与任何对象其他.也就是说,其输出指针(作为隐藏的第一个arg传递)没有任何别名.

您可以将其视为隐藏的第一个arg输出指针,上面带有隐式的restrict. (因为在C抽象机中,返回值是一个单独的对象,并且x86-64系统V指定了调用方提供空间.x86-64SysV没有给予调用方许可以引入别名.)

使用其他地方的本地地址作为目的地(而不是使用单独的专用空间,然后再复制到实际本地地址)是可以的,但是不得使用可能指向其他方式的指针.这需要进行转义分析,以确保没有将指向此类局部变量的指针传递到函数外部.

我认为x86-64 SysV调用约定通过让调用者提供真实的返回值对象,而不是强迫 callee 发明在此处为C抽象机建模该临时临时文件,以确保所有对retval的写操作均在其他任何写操作之后进行. IMO,这不是调用方为返回值提供空间"的意思.

这绝对是GCC和其他编译器在实践中的解释方式,这是一个已经存在很长时间的调用约定(自从第一批AMD64芯片问世之前的一两年,直到2000年代初期)重要的一部分.


在这种情况下,您的优化一旦完成便会中断:

struct Vec3{
    double x, y, z;
};
struct Vec3 glob3;

__attribute__((noinline))
struct Vec3 do_something(void) {  // copy glob3 to retval in some order
    return (struct Vec3){glob3.y, glob3.z, glob3.x};
}

__attribute__((noinline))
void use(struct Vec3 * out){   // copy do_something() result to *out
    *out = do_something();
}


void caller(void) {
    use(&glob3);
}

通过建议的优化,do_something的输出对象将为glob3.但它也显示为glob3.

do_something的有效实现是按照源顺序将元素从glob3复制到(%rdi),这将在读取glob3.x作为返回值的第三个元素之前执行glob3.x = glob3.y.

实际上完全 gcc -O1的作用( Richard Biener说 GCC错过了MM正确的错误报告,并且如果编译器可以证明函数正常返回(不是异常或longjmp),则该优化理论上是合法的(但GCC可能不会这样做)寻找):

如果是这样,那么只要我们能够证明限制条件,该限制将使此优化安全 do_something是"noexcept",并且不是longjmp.

是的

有一个noexecpt声明,但没有(AFAIK)您可以放在原型上的nolongjmp声明.

因此,这意味着(即使从理论上来说)只有当我们可以看到另一个函数的主体时,才可以作为过程间优化.除非noexcept也不意味着没有longjmp.

I try to understand the implication of System V AMD64 - ABI's calling convention and looking at the following example:

struct Vec3{
    double x, y, z;
};

struct Vec3 do_something(void);

void use(struct Vec3 * out){
    *out = do_something();
}

A Vec3-variable is of type MEMORY and thus the caller (use) must allocate space for the returned variable and pass it as hidden pointer to the callee (i.e. do_something). Which is what we see in the resulting assembler (on godbolt, compiled with -O2):

use:
        pushq   %rbx
        movq    %rdi, %rbx           ;remember out
        subq    $32, %rsp            ;memory for returned object
        movq    %rsp, %rdi           ;hidden pointer to %rdi
        call    do_something
        movdqu  (%rsp), %xmm0        ;copy memory to out
        movq    16(%rsp), %rax
        movups  %xmm0, (%rbx)
        movq    %rax, 16(%rbx)
        addq    $32, %rsp            ;unwind/restore
        popq    %rbx
        ret

I understand, that an alias of pointer out (e.g. as global variable) could be used in do_something and thus out cannot be passed as hidden pointer to do_something: if it would, out would be changed inside of do_something and not when do_something returns, thus some calculations might become faulty. For example this version of do_something would return faulty results:

struct Vec3 global; //initialized somewhere
struct Vec3 do_something(void){
   struct Vec3 res;
   res.x = 2*global.x; 
   res.y = global.y+global.x; 
   res.z = 0; 
   return res;
}

if out where an alias for the global variable global and were used as hidden pointer passed in %rdi, res were also an alias of global, because the compiler would use the memory pointed to by hidden pointer directly (a kind of RVO in C), without actually creating a temporary object and copying it when returned, then res.y would be 2*x+y(if x,y are old values of global) and not x+y as for any other hidden pointer.

It was suggested to me, that using restrict should solve the problem, i.e.

void use(struct Vec3 *restrict out){
    *out = do_something();
}

because now, the compiler knows, that there are no aliases of out which could be used in do_something, so the assembler could be as simple as this:

use:
    jmp     do_something ; %rdi is now the hidden pointer

However, this is not the case neither for gcc nor for clang - the assembler stays unchanged (see on godbolt).

What prevents the usage of out as hidden pointer?


NB: The desired (or very similar) behavior would be achieved for a slightly different function-signature:

struct Vec3 use_v2(){
    return do_something();
}

which results in (see on godbolt):

use_v2:
    pushq   %r12
    movq    %rdi, %r12
    call    do_something
    movq    %r12, %rax
    popq    %r12
    ret

解决方案

A function is allowed to assume its return-value object (pointed-to by a hidden pointer) is not the same object as anything else. i.e. that its output pointer (passed as a hidden first arg) doesn't alias anything.

You could think of this as the hidden first arg output pointer having an implicit restrict on it. (Because in the C abstract machine, the return value is a separate object, and the x86-64 System V specifies that the caller provides space. x86-64 SysV doesn't give the caller license to introduce aliasing.)

Using an otherwise-private local as the destination (instead of separate dedicated space and then copying to a real local) is fine, but pointers that may point to something reachable another way must not be used. This requires escape analysis to make sure that a pointer to such a local hasn't been passed outside of the function.

I think the x86-64 SysV calling convention models the C abstract machine here by having the caller provide a real return-value object, not forcing the callee to invent that temporary if needed to make sure all the writes to the retval happened after any other writes. That's not what "the caller provides space for the return value" means, IMO.

That's definitely how GCC and other compilers interpret it in practice, which is a big part of what matters in a calling convention that's been around this long (since a year or two before the first AMD64 silicon, so very early 2000s).


Here's a case where your optimization would break if it were done:

struct Vec3{
    double x, y, z;
};
struct Vec3 glob3;

__attribute__((noinline))
struct Vec3 do_something(void) {  // copy glob3 to retval in some order
    return (struct Vec3){glob3.y, glob3.z, glob3.x};
}

__attribute__((noinline))
void use(struct Vec3 * out){   // copy do_something() result to *out
    *out = do_something();
}


void caller(void) {
    use(&glob3);
}

With the optimization you're suggesting, do_something's output object would be glob3. But it also reads glob3.

A valid implementation for do_something would be to copy elements from glob3 to (%rdi) in source order, which would do glob3.x = glob3.y before reading glob3.x as the 3rd element of the return value.

That is in fact exactly what gcc -O1 does (Godbolt compiler explorer)

do_something:
    movq    %rdi, %rax               # tmp90, .result_ptr
    movsd   glob3+8(%rip), %xmm0      # glob3.y, glob3.y
    movsd   %xmm0, (%rdi)             # glob3.y, <retval>.x
    movsd   glob3+16(%rip), %xmm0     # glob3.z, _2
    movsd   %xmm0, 8(%rdi)            # _2, <retval>.y
    movsd   glob3(%rip), %xmm0        # glob3.x, _3
    movsd   %xmm0, 16(%rdi)           # _3, <retval>.z
    ret     

Notice the glob3.y, <retval>.x store before the load of glob3.x.

So without restrict anywhere in the source, GCC already emits asm for do_something that assumes no aliasing between the retval and glob3.


I don't think using struct Vec3 *restrict out wouldn't help at all: that only tells the compiler that inside use() you won't access the *out object through any other name. Since use() doesn't reference glob3, it's not UB to pass &glob3 as an arg to a restrict version of use.

I may be wrong here; @M.M argues in comments that *restrict out might make this optimization safe because the execution of do_something() happens during out(). (Compilers still don't actually do it, but maybe they would be allowed to for restrict pointers.)

Update: Richard Biener said in the GCC missed-optimization bug-report that M.M is correct, and if the compiler can prove that the function returns normally (not exception or longjmp), the optimization is legal in theory (but still not something GCC is likely to look for):

If so, restrict would make this optimization safe if we can prove that do_something is "noexcept" and doesn't longjmp.

Yes.

There's a noexecpt declaration, but there isn't (AFAIK) a nolongjmp declaration you can put on a prototype.

So that means it's only possible (even in theory) as an inter-procedural optimization when we can see the other function's body. Unless noexcept also means no longjmp.

这篇关于是什么阻止了将函数参数用作隐藏指针?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆