GCC的尾巴呼叫优化如何'聪明'? [英] How 'smart' is GCC's Tail-Call-Optimisation?

查看:167
本文介绍了GCC的尾巴呼叫优化如何'聪明'?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



For-Loop:

  #include< stdio.h> 
#define n(196607)

int main(){
long loop;
int count = 0;
for(loop = 0; loop< n; loop ++){
count ++;
}
printf(Result =%d \\\
,count);

返回0;
}

递归:

  #include< stdio.h> 
#define n(196607)

long递归(长循环){
return(loop> 0)?递归(loop-1)+1:0;
}

int main(){
long result;
result = recursive(n);
printf(Result =%d \\\
,结果);
返回0;
}

在看到这段代码时,我看到了递归(循环-1)+1 并且认为啊,这不是尾调用递归,因为它在调用递归完成后有工作。它需要递增返回值。

果然,没有优化,递归代码会触发堆栈溢出,就像您期望的那样。



加上 -O2 标志,但是没有遇到堆栈溢出,我认为堆栈被重用,而不是推动更多和更多的到堆栈 - 这是tco。

GCC显然可以检测到这个简单的情况(+1返回值)并优化它,但它走多远?



当递归调用不是最后一个要执行的操作时,gcc可以使用tco优化哪些限制?



附录:
我写了一个完全尾递归返回函数(); 版本的代码。
在9999999迭代循环中包装上述代码,我提出了以下时机:

  $ for f in *。可执行程序;做时间./$f>结果;在'* .exe'
+ ./forLoop.c.exe

真实0m3.650s
用户0m3.588s
中完成
+ sys 0m0.061s
+ for'* .exe'
+ ./recursive.c.exe

real 0m3.682s
user 0m3.588s
sys 0m0.093s
+ for'* .exe'
+ ./tail_recursive.c.exe

real 0m3.697s
user 0m3 .588s
sys 0m0.077s

所以一个(很简单但不是非常严格的)基准表明它确实看起来与所花费的时间顺序相同。

解决方案

只需反汇编代码,看看发生了什么。没有优化,我得到了这个:

pre $ code> 0x0040150B cmpl $ 0x0,0x10(%rbp)
0x0040150F jle 0x401523<递归+ 35>
0x00401511 mov 0x10(%rbp),%eax
0x00401514 sub $ 0x1,%eax
0x00401517 mov%eax,%ecx
0x00401519 callq 0x401500< recursive>

但是使用-O1,-O2或-O3,我得到了这个:

  0x00402D09 mov $ 0x2ffff,%edx 

这与尾部优化没有任何关系,但更激进的优化。编译器简单地将整个函数内联并预先计算出结果。



这可能就是为什么你最终在所有不同的基准测试案例中获得相同结果的原因。


I just had a discussion where the following two peices of C code were being discussed:

For-Loop:

#include <stdio.h>
#define n (196607)

int main() {
  long loop;
  int count=0;
  for (loop=0;loop<n;loop++) {
    count++;
  }
  printf("Result = %d\n",count);

  return 0;
}

Recursive:

#include <stdio.h>
#define n (196607)

long recursive(long loop) {
  return (loop>0) ? recursive(loop-1)+1: 0;
}

int main() {
  long result;
  result = recursive(n);
  printf("Result = %d\n",result);
  return 0;
}

On seeing this code, I saw recursive(loop-1)+1 and thought "ah, that's not tail call recursive" because it has work to do after the call to recursive is complete; it needs to increment the return value.

Sure enough, with no optimisation, the recursive code triggers a stack overflow, as you would expect.

with the -O2 flag however, the stack overflow is not encountered, which I take to mean that the stack is reused, rather than pushing more and more onto the stack - which is tco.

GCC can obviously detect this simple case (+1 to return value) and optimise it out, but how far does it go?

What are the limits to what gcc can optimise with tco, when the recursive call isn't the last operation to be performed?

Addendum: I've written a fully tail recursive return function(); version of the code. Wrapping the above in a loop with 9999999 iterations, I came up with the following timings:

$ for f in *.exe; do time ./$f > results; done
+ for f in '*.exe'
+ ./forLoop.c.exe

real    0m3.650s
user    0m3.588s
sys     0m0.061s
+ for f in '*.exe'
+ ./recursive.c.exe

real    0m3.682s
user    0m3.588s
sys     0m0.093s
+ for f in '*.exe'
+ ./tail_recursive.c.exe

real    0m3.697s
user    0m3.588s
sys     0m0.077s

so a (admittedly simple and not very rigorous) benchmark shows that it does indeed seem to be in the same order of time taken.

解决方案

Just disassemble the code and see what happened. Without optimizations, I get this:

0x0040150B  cmpl   $0x0,0x10(%rbp)
0x0040150F  jle    0x401523 <recursive+35>
0x00401511  mov    0x10(%rbp),%eax
0x00401514  sub    $0x1,%eax
0x00401517  mov    %eax,%ecx
0x00401519  callq  0x401500 <recursive>

But with -O1, -O2 or -O3 I get this:

0x00402D09  mov    $0x2ffff,%edx

This doesn't have anything to do with tail optimizations, but much more radical optimizations. The compiler simply inlined the whole function and pre-calculated the result.

This is likely why you end up with the same result in all your different cases of benchmarking.

这篇关于GCC的尾巴呼叫优化如何'聪明'?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆