是什么阻止了将函数参数用作隐藏指针? [英] What prevents the usage of a function argument as hidden pointer?
问题描述
我尝试了解 System V AMD64的含义- ABI的 调用约定并查看以下示例:
struct Vec3{
double x, y, z;
};
struct Vec3 do_something(void);
void use(struct Vec3 * out){
*out = do_something();
}
Vec3
变量的类型为MEMORY,因此调用方(use
)必须为返回的变量分配空间,并将其作为隐藏指针传递给被调用方(即do_something
).这是我们在生成的汇编器中看到的内容(在-O2
上编译的 上的):>
use:
pushq %rbx
movq %rdi, %rbx ;remember out
subq $32, %rsp ;memory for returned object
movq %rsp, %rdi ;hidden pointer to %rdi
call do_something
movdqu (%rsp), %xmm0 ;copy memory to out
movq 16(%rsp), %rax
movups %xmm0, (%rbx)
movq %rax, 16(%rbx)
addq $32, %rsp ;unwind/restore
popq %rbx
ret
我知道,指针out
的别名(例如,用作全局变量)可以在do_something
中使用,因此out
不能作为隐藏的指针传递给do_something
:如果可以的话,out
将在do_something
内部进行更改,而不是在do_something
返回时进行更改,因此某些计算可能会出错.例如,此版本的do_something
会返回错误的结果:
struct Vec3 global; //initialized somewhere
struct Vec3 do_something(void){
struct Vec3 res;
res.x = 2*global.x;
res.y = global.y+global.x;
res.z = 0;
return res;
}
如果out
用作全局变量global
的别名,并用作在%rdi
中传递的隐藏指针,则res
也是global
的别名,因为编译器将使用指向的内存直接由隐藏的指针(C中的一种RVO)创建,而无需实际创建临时对象并在返回时将其复制,则res.y
将是2*x+y
(如果x,y
是global
的旧值),而不是x+y
以及其他任何隐藏的指针.
有人建议我,使用restrict
应该可以解决问题,即
void use(struct Vec3 *restrict out){
*out = do_something();
}
因为编译器知道,现在do_something
中没有可以使用的out
别名,所以汇编程序可以像这样简单:
use:
jmp do_something ; %rdi is now the hidden pointer
但是,无论是gcc还是clang都不是这种情况-汇编程序保持不变(请参见 godbolt ).
什么阻止了将out
用作隐藏指针?
注意:对于稍微不同的功能签名,可以实现所需的(或非常相似的)行为:
struct Vec3 use_v2(){
return do_something();
}
结果(请参见 godbolt ):
use_v2:
pushq %r12
movq %rdi, %r12
call do_something
movq %r12, %rax
popq %r12
ret
允许函数假定其返回值对象(由隐藏指针指向)与任何对象其他.也就是说,其输出指针(作为隐藏的第一个arg传递)没有任何别名.
您可以将其视为隐藏的第一个arg输出指针,上面带有隐式的restrict
. (因为在C抽象机中,返回值是一个单独的对象,并且x86-64系统V指定了调用方提供空间.x86-64SysV没有给予调用方许可以引入别名.)
使用其他地方的本地地址作为目的地(而不是使用单独的专用空间,然后再复制到实际本地地址)是可以的,但是不得使用可能指向其他方式的指针.这需要进行转义分析,以确保没有将指向此类局部变量的指针传递到函数外部.
我认为x86-64 SysV调用约定通过让调用者提供真实的返回值对象,而不是强迫 callee 发明在此处为C抽象机建模该临时临时文件,以确保所有对retval的写操作均在其他任何写操作之后进行. IMO,这不是调用方为返回值提供空间"的意思.
这绝对是GCC和其他编译器在实践中的解释方式,这是一个已经存在很长时间的调用约定(自从第一批AMD64芯片问世之前的一两年,直到2000年代初期)重要的一部分.
在这种情况下,您的优化一旦完成便会中断:
struct Vec3{
double x, y, z;
};
struct Vec3 glob3;
__attribute__((noinline))
struct Vec3 do_something(void) { // copy glob3 to retval in some order
return (struct Vec3){glob3.y, glob3.z, glob3.x};
}
__attribute__((noinline))
void use(struct Vec3 * out){ // copy do_something() result to *out
*out = do_something();
}
void caller(void) {
use(&glob3);
}
通过建议的优化,do_something
的输出对象将为glob3
.但它也显示为glob3
.
do_something
的有效实现是按照源顺序将元素从glob3
复制到(%rdi)
,这将在读取glob3.x
作为返回值的第三个元素之前执行glob3.x = glob3.y
.
实际上完全 gcc -O1
的作用( Richard Biener说 GCC错过了MM正确的错误报告,并且如果编译器可以证明函数正常返回(不是异常或longjmp),则该优化理论上是合法的(但GCC可能不会这样做)寻找):
如果是这样,那么只要我们能够证明限制条件,该限制将使此优化安全 do_something是"noexcept",并且不是longjmp.
是的
有一个noexecpt
声明,但没有(AFAIK)您可以放在原型上的nolongjmp
声明.
因此,这意味着(即使从理论上来说)只有当我们可以看到另一个函数的主体时,才可以作为过程间优化.除非noexcept
也不意味着没有longjmp
.
I try to understand the implication of System V AMD64 - ABI's calling convention and looking at the following example:
struct Vec3{
double x, y, z;
};
struct Vec3 do_something(void);
void use(struct Vec3 * out){
*out = do_something();
}
A Vec3
-variable is of type MEMORY and thus the caller (use
) must allocate space for the returned variable and pass it as hidden pointer to the callee (i.e. do_something
). Which is what we see in the resulting assembler (on godbolt, compiled with -O2
):
use:
pushq %rbx
movq %rdi, %rbx ;remember out
subq $32, %rsp ;memory for returned object
movq %rsp, %rdi ;hidden pointer to %rdi
call do_something
movdqu (%rsp), %xmm0 ;copy memory to out
movq 16(%rsp), %rax
movups %xmm0, (%rbx)
movq %rax, 16(%rbx)
addq $32, %rsp ;unwind/restore
popq %rbx
ret
I understand, that an alias of pointer out
(e.g. as global variable) could be used in do_something
and thus out
cannot be passed as hidden pointer to do_something
: if it would, out
would be changed inside of do_something
and not when do_something
returns, thus some calculations might become faulty. For example this version of do_something
would return faulty results:
struct Vec3 global; //initialized somewhere
struct Vec3 do_something(void){
struct Vec3 res;
res.x = 2*global.x;
res.y = global.y+global.x;
res.z = 0;
return res;
}
if out
where an alias for the global variable global
and were used as hidden pointer passed in %rdi
, res
were also an alias of global
, because the compiler would use the memory pointed to by hidden pointer directly (a kind of RVO in C), without actually creating a temporary object and copying it when returned, then res.y
would be 2*x+y
(if x,y
are old values of global
) and not x+y
as for any other hidden pointer.
It was suggested to me, that using restrict
should solve the problem, i.e.
void use(struct Vec3 *restrict out){
*out = do_something();
}
because now, the compiler knows, that there are no aliases of out
which could be used in do_something
, so the assembler could be as simple as this:
use:
jmp do_something ; %rdi is now the hidden pointer
However, this is not the case neither for gcc nor for clang - the assembler stays unchanged (see on godbolt).
What prevents the usage of out
as hidden pointer?
NB: The desired (or very similar) behavior would be achieved for a slightly different function-signature:
struct Vec3 use_v2(){
return do_something();
}
which results in (see on godbolt):
use_v2:
pushq %r12
movq %rdi, %r12
call do_something
movq %r12, %rax
popq %r12
ret
A function is allowed to assume its return-value object (pointed-to by a hidden pointer) is not the same object as anything else. i.e. that its output pointer (passed as a hidden first arg) doesn't alias anything.
You could think of this as the hidden first arg output pointer having an implicit restrict
on it. (Because in the C abstract machine, the return value is a separate object, and the x86-64 System V specifies that the caller provides space. x86-64 SysV doesn't give the caller license to introduce aliasing.)
Using an otherwise-private local as the destination (instead of separate dedicated space and then copying to a real local) is fine, but pointers that may point to something reachable another way must not be used. This requires escape analysis to make sure that a pointer to such a local hasn't been passed outside of the function.
I think the x86-64 SysV calling convention models the C abstract machine here by having the caller provide a real return-value object, not forcing the callee to invent that temporary if needed to make sure all the writes to the retval happened after any other writes. That's not what "the caller provides space for the return value" means, IMO.
That's definitely how GCC and other compilers interpret it in practice, which is a big part of what matters in a calling convention that's been around this long (since a year or two before the first AMD64 silicon, so very early 2000s).
Here's a case where your optimization would break if it were done:
struct Vec3{
double x, y, z;
};
struct Vec3 glob3;
__attribute__((noinline))
struct Vec3 do_something(void) { // copy glob3 to retval in some order
return (struct Vec3){glob3.y, glob3.z, glob3.x};
}
__attribute__((noinline))
void use(struct Vec3 * out){ // copy do_something() result to *out
*out = do_something();
}
void caller(void) {
use(&glob3);
}
With the optimization you're suggesting, do_something
's output object would be glob3
. But it also reads glob3
.
A valid implementation for do_something
would be to copy elements from glob3
to (%rdi)
in source order, which would do glob3.x = glob3.y
before reading glob3.x
as the 3rd element of the return value.
That is in fact exactly what gcc -O1
does (Godbolt compiler explorer)
do_something:
movq %rdi, %rax # tmp90, .result_ptr
movsd glob3+8(%rip), %xmm0 # glob3.y, glob3.y
movsd %xmm0, (%rdi) # glob3.y, <retval>.x
movsd glob3+16(%rip), %xmm0 # glob3.z, _2
movsd %xmm0, 8(%rdi) # _2, <retval>.y
movsd glob3(%rip), %xmm0 # glob3.x, _3
movsd %xmm0, 16(%rdi) # _3, <retval>.z
ret
Notice the glob3.y, <retval>.x
store before the load of glob3.x
.
So without restrict
anywhere in the source, GCC already emits asm for do_something
that assumes no aliasing between the retval and glob3
.
I don't think using struct Vec3 *restrict out
wouldn't help at all: that only tells the compiler that inside use()
you won't access the *out
object through any other name. Since use()
doesn't reference glob3
, it's not UB to pass &glob3
as an arg to a restrict
version of use
.
I may be wrong here; @M.M argues in comments that *restrict out
might make this optimization safe because the execution of do_something()
happens during out()
. (Compilers still don't actually do it, but maybe they would be allowed to for restrict
pointers.)
Update: Richard Biener said in the GCC missed-optimization bug-report that M.M is correct, and if the compiler can prove that the function returns normally (not exception or longjmp), the optimization is legal in theory (but still not something GCC is likely to look for):
If so, restrict would make this optimization safe if we can prove that do_something is "noexcept" and doesn't longjmp.
Yes.
There's a noexecpt
declaration, but there isn't (AFAIK) a nolongjmp
declaration you can put on a prototype.
So that means it's only possible (even in theory) as an inter-procedural optimization when we can see the other function's body. Unless noexcept
also means no longjmp
.
这篇关于是什么阻止了将函数参数用作隐藏指针?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!