gfortran支持尾叫消除吗? [英] Does gfortran support tail call elimination?
问题描述
如果gfortran确实消除了尾部调用,我做了这个小程序来测试:
I made this small program to test, if gfortran does tail call elimination:
program tailrec
implicit none
print *, tailrecsum(5, 0)
contains
recursive function tailrecsum (x, running_total) result (ret_val)
integer, intent(in) :: x
integer, intent(in) :: running_total
integer :: ret_val
if (x == 0) then
ret_val = running_total
return
end if
ret_val = tailrecsum (x-1, running_total + x)
end function tailrecsum
end program
要进行检查,我使用-S选项对其进行了编译,以查看说明.这是tailrecsum函数的行:
To check, I compiled it with the -S option, to take a look at the instructions. Here the lines for the tailrecsum function:
tailrecsum.3429:
.LFB1:
.cfi_startproc
movl (%rdi), %eax
testl %eax, %eax
jne .L2
movl (%rsi), %eax
ret
.p2align 4,,10
.p2align 3
.L2:
subq $24, %rsp
.cfi_def_cfa_offset 32
leal -1(%rax), %edx
addl (%rsi), %eax
leaq 8(%rsp), %rdi
leaq 12(%rsp), %rsi
movl %edx, 8(%rsp)
movl %eax, 12(%rsp)
call tailrecsum.3429
addq $24, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
最后,我看到了 call tailrecsum.3429
,因此,认为没有尾声消除功能.当我使用 -O2
或 -O3
和 -foptimize-sibling-calls
时,也是如此.那么,gfortran不支持此功能吗?还是我的代码有问题?
At the end, I see call tailrecsum.3429
and therefore, think that there is no tail call elimination. This is the same also when I use -O2
or -O3
and -foptimize-sibling-calls
.
So, does gfortran not support this or is it a problem of my code?
推荐答案
它确实支持它.要避免很多非常细微的陷阱(它们会损害尾调用优化)是非常棘手的.
It does support it. It is quite tricky to avoid many very subtle traps which harm the tail call optimization.
如果按值传递参数,则编译器优化尾调用变得更加简单.在那种情况下,接收程序不需要指向它的临时对象(地址).
It becomes simpler for the compiler to optimize tail calls if you pass the arguments by value. In that case there is no temporary to which the receiving procedure needs to have a pointer (address).
实际上,此修改足以消除尾部调用并启用无限递归:
In fact, this modification is enough to get the tail call elimination and enable unlimited recursion:
recursive function tailrecsum (x, running_total) result (ret_val) bind(C)
integer, value :: x
integer, value :: running_total
integer :: ret_val
if (x == 0) then
ret_val = running_total
return
end if
ret_val = tailrecsum (x-1, running_total + x)
end function tailrecsum
Gfortran不需要 bind(C)
,因为它将所有 value
都实现为类似C的值传递.英特尔确实需要它,因为它会创建一个临时文件并传递其地址.
Gfortran does not require the bind(C)
because it implements all value
as C-like pass by value. Intel does require it because it creates a temporary and passes its address.
在不同的体系结构上,细节可能有所不同,具体取决于谁负责清理什么内容.
The details may differ on different architectures, depending on who is responsible for the cleanup of what.
考虑此版本:
program tailrec
use iso_fortran_env
implicit none
integer(int64) :: acc, x
acc = 0
x = 500000000
call tailrecsum(x, acc)
print *, acc
contains
recursive subroutine tailrecsum (x, running_total)
integer(int64), intent(inout) :: x
integer(int64), intent(inout) :: running_total
integer(int64) :: ret_val
if (x == 0) return
running_total = running_total + x
x = x - 1
call tailrecsum (x, running_total)
end subroutine tailrecsum
end program
经过5亿次迭代,它显然会在没有TCO的情况下炸毁堆栈,但事实并非如此:
With 500000000 iterations it would clearly blow the stack without TCO, but it does not:
> gfortran -O2 -frecursive tailrec.f90
> ./a.out
125000000250000000
您可以使用 -fdump-tree-optimized
检查编译器更容易执行的操作.老实说,我什至不去尝试理解您的程序集输出.X86组装对我来说太深奥了,我的大脑只能处理某些RISC.
You can examine what the compiler does more easily using -fdump-tree-optimized
. Honestly, I didn't even bother trying to understand your assembly output. X86 assembly is simply too esoteric for me, my simple brain can handle only certain RISCs.
您可以看到在原始版本中调用下一个迭代之后,仍有很多事情发生:
You can see that there is still a lot going on after the call to the next iteration in your original version:
<bb 6>:
_25 = _5 + -3;
D.1931 = _25;
_27 = _18 + _20;
D.1930 = _27;
ret_val_28 = tailrecsum (&D.1931, &D.1930);
D.1930 ={v} {CLOBBER};
D.1931 ={v} {CLOBBER};
<bb 7>:
# _29 = PHI <_20(5), ret_val_28(6)>
<bb 8>:
# _22 = PHI <_11(4), _29(7)>
<bb 9>:
# _1 = PHI <ret_val_7(3), _22(8)>
return _1;
}
我不是GIMPLE的专家,但是 D.193x
操作肯定链接到为调用而放在堆栈中的临时表达式.
I am not an expert in GIMPLE, but the D.193x
operations are definitely linked to the temporary expressions that are put on the stack for the call.
The PHI
operations then find which version of the return value will be actually returned based on which branch was actually taken in the if statement (https://gcc.gnu.org/onlinedocs/gccint/SSA.html).
就像我说的那样,有时将代码简化为正确的格式有时是棘手的,这对于gfortran进行尾部调用优化是可以接受的.
As I said it is sometimes tricky to simplify your code to the right form which is acceptable for gfortran to perform the tail call optimization.
这篇关于gfortran支持尾叫消除吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!