从非空函数的末尾掉下来时写入未使用的参数的返回值 [英] Return value from writing an unused parameter when falling off the end of a non-void function

查看:12
本文介绍了从非空函数的末尾掉下来时写入未使用的参数的返回值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这个高尔夫答案中,我看到了一个技巧,其中返回值是未传入的第二个参数.

int f(i, j){j = i;}int main(){返回 f(3);}

gcc 的汇编输出,当代码复制 j = i 时看起来像code> 它将结果存储在 eax 中,它恰好是返回值.

f:pushq %rbpmovq %rsp, %rbpmovl %edi, -4(%rbp)movl %esi, -8(%rbp)movl -4(%rbp), %eaxmovl %eax, -8(%rbp)没有popq %rbp回复主要的:pushq %rbpmovq %rsp, %rbpmovl $3, %edimovl $0, %eax呼叫 fpopq %rbp回复

那么,这只是幸运吗?这是由 gcc 记录的吗?它只适用于 -O0,但它适用于我尝试过的一系列 i-m32 和一堆不同版本的值海湾合作委员会.

解决方案

gcc -O0 喜欢计算返回值寄存器中的表达式,if a完全需要寄存器.(GCC -O0 通常只是喜欢在 retval 寄存器中有值,但这不仅仅是选择它作为第一个临时值.)

我进行了一些测试,看起来 GCC -O0 确实是故意跨多个 ISA 执行此操作,有时甚至使用额外的 mov 指令或等效指令.IIRC 我做了一个更复杂的表达式,所以计算结果在另一个寄存器中结束,但它仍然将它复制回 retval 寄存器.

x++ 这样可以(在 x86 上)编译到内存目标 inc 或 add 的东西不会将值留在寄存器中,但赋值通常会.所以值得注意的是 GCC 正在处理像 GNU C 语句表达式.


没有被任何文件记录、保证或标准化.这是一个实现细节,而不是让您像这样利用的东西.

回归"这种方式的值意味着您正在使用GCC -O0"而不是 C 进行编程. 代码高尔夫规则的措辞表明程序必须在至少一种实现上工作.但我的理解是,它们应该出于正确的原因而工作,而不是因为某些副作用实现细节.它们在 clang 上失败并不是因为 clang 不支持某些语言功能,只是因为它们甚至不是用 C 编写的.

打破优化也并不酷;某种程度的 UB 在代码高尔夫中通常是可以接受的,例如整数环绕或指针转换类型双关语是人们可能合理希望得到明确定义的东西.但这纯粹是滥用一个编译器的实现细节,而不是语言特性.

我在 Codegolf 上的相关答案下的评论中论证了这一点.SE C 高尔夫技巧问答(错误地声称它在 GCC 之外有效).该答案有 4 票反对(值得更多 IMO),但有 16 票赞成.所以社区的一些成员不同意这是可怕和愚蠢的.


有趣的事实:在 ISO C++(但不是 C)中,执行在非void 函数的末尾是未定义行为,即使调用者没有't 使用结果.即使在 GNU C++ 中也是如此;在 -O0 之外 GCC 和 clang 有时会发出类似 ud2(非法指令)的代码,用于到达函数末尾而没有 return.所以 GCC 通常不会在这里定义行为(对于 ISO C 和 C++ 未定义的事情,允许哪些实现做.例如 gcc -fwrapv 将有符号溢出定义为 2 的补码环绕.)>

但是在 ISO C 中,从非 void 函数的末尾脱落是合法的:只有在调用者使用返回值时它才会变成 UB.没有 -Wall GCC 甚至可能不会发出警告.检查没有返回语句的函数的返回值

禁用优化后,函数内联不会发生,因此 UB 在编译时并不真正可见.(除非你使用 __attribute__((always_inline))).


传递第二个参数只会给你一些赋值.它是一个函数 arg 并不重要.但是 i=i; 即使使用 -O0 也会优化掉,所以你确实需要一个单独的变量.也只是 i; 优化掉了.

有趣的事实:递归 f(i){ f(i);} 函数体在将 i 复制到第一个 arg-passing 寄存器之前通过 EAX 反弹.所以 GCC 真的很喜欢 EAX.

 movl -4(%rbp), %eaxmovl %eax, %edimovl $0, %eax # 没有完整的原型,在 AL 中传递 # FP args呼叫 f

i++; 没有加载到 EAX 中;它只使用内存目标 add 而不加载到寄存器中.值得尝试使用 gcc -O0 for ARM.

In this golfing answer I saw a trick where the return value is the second parameter which is not passed in.

int f(i, j) 
{
    j = i;   
}

int main() 
{
    return f(3);
}

From gcc's assembly output it looks like when the code copies j = i it stores the result in eax which happens to be the return value.

f:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    %edi, -4(%rbp)
        movl    %esi, -8(%rbp)
        movl    -4(%rbp), %eax
        movl    %eax, -8(%rbp)
        nop
        popq    %rbp
        ret
main:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    $3, %edi
        movl    $0, %eax
        call    f
        popq    %rbp
        ret 

So, did this happen just by being lucky? Is this documented by gcc? It only works with -O0, but it works with a bunch of values of i I tried, -m32, and a bunch of different versions of GCC.

解决方案

gcc -O0 likes to evaluate expressions in the return-value register, if a register is needed at all. (GCC -O0 generally just likes to have values in the retval register, but this goes beyond picking that as the first temporary.)

I've tested a bit, and it really looks like GCC -O0 does this on purpose across multiple ISAs, sometimes even using an extra mov instruction or equivalent. IIRC I made an expression more complicated so the result of evaluation ended up in another register, but it still copied it back to the retval register.

Things like x++ that can (on x86) compile to a memory-destination inc or add won't leave the value in a register, but assignments typically will. So it's note quite like GCC is treating function bodies like GNU C statement-expressions.


This is not documented, guaranteed, or standardized by anything. It's an implementation detail, not something intended for you to take advantage of like this.

"Returning" a value this way means you're programming in "GCC -O0", not C. The wording of the code-golf rules says that programs have to work on at least one implementation. But my reading of that is that they should work for the right reasons, not because of some side-effect implementation detail. They break on clang not because clang doesn't support some language feature, just because they're not even written in C.

Breaking with optimization enabled is also not cool; some level of UB is generally acceptable in code golf, like integer wraparound or pointer-casting type punning being things that one might reasonably wish were well-defined. But this is pure abuse of an implementation detail of one compiler, not a language feature.

I argued this point in comments under the relevant answer on Codegolf.SE C golfing tips Q&A (Which incorrectly claims it works beyond GCC). That answer has 4 downvotes (and deserves more IMO), but 16 upvotes. So some members of the community disagree that this is terrible and silly.


Fun fact: in ISO C++ (but not C), having execution fall off the end of a non-void function is Undefined Behaviour, even if the caller doesn't use the result. This is true even in GNU C++; outside of -O0 GCC and clang will sometimes emit code like ud2 (illegal instruction) for a path of execution that reaches the end of a function without a return. So GCC doesn't in general define the behaviour here (which implementations are allowed to do for things that ISO C and C++ leaves undefined. e.g. gcc -fwrapv defines signed overflow as 2's complement wraparound.)

But in ISO C, it's legal to fall off the end of a non-void function: it only becomes UB if the caller uses the return value. Without -Wall GCC may not even warn. Checking return value of a function without return statement

With optimization disabled, function inlining won't happen so the UB isn't really compile-time visible. (Unless you use __attribute__((always_inline))).


Passing a 2nd arg merely gives you something to assign to. It's not important that it's a function arg. But i=i; optimizes away even with -O0 so you do need a separate variable. Also just i; optimizes away.

Fun fact: a recursive f(i){ f(i); } function body does bounce i through EAX before copying it to the first arg-passing register. So GCC just really loves EAX.

        movl    -4(%rbp), %eax
        movl    %eax, %edi
        movl    $0, %eax             # without a full prototype, pass # of FP args in AL
        call    f

i++; doesn't load into EAX; it just uses a memory-destination add without loading into a register. Worth trying with gcc -O0 for ARM.

这篇关于从非空函数的末尾掉下来时写入未使用的参数的返回值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆