什么导致这个微不足道的Fortran代码的运行时差异? [英] What causes the runtime difference in this trivial fortran code?
问题描述
模块Moo
包含
子程序main ()
integer :: res
real :: start,finish
integer :: i
call cpu_time(start)
do i = 1,1000000000
call Squared(5,res)
enddo
call cpu_time(finish)
print'(Time =,f6.3, )。),完成开始
结束子程序
子程序平方(v,res)
整数,意图(in):: v
整数, intent(out):: res
res = v * v
结束子程序
!子程序main2()
! integer :: res
!真实::开始,完成
!整数:: i
!
!调用cpu_time(start)
!
!做我= 1,1000000000
! res = v * v
! enddo
!调用cpu_time(完成)
!
!打印'(时间=,f6.3,秒。),完成开始
! end子程序
结束模块
程序foo
使用Moo
调用main()
!调用main2()
结束程序
编译器在mac上是gfortran 4.6.2。如果我使用 -O0
进行编译并运行程序,则时间为4.36秒。如果我取消注释子程序 main2()
,但不调用它,则平均时间将变为4.15秒。如果我也取消注释调用main2()
,第一次计时变为3.80,第二次1.86(可以理解,我没有函数调用)。
我比较了在第二和第三种情况下生成的汇编程序(例程未注释;调用注释和未注释),它们完全相同,除了实际调用main2例程。
代码如何从调用到未来将发生的例程中获得这种性能提升,并且在结果代码中基本没有区别?
do i = 1,1000000000
do j = 1,10
Call Squared(5,res)
enddo
enddo
我只查看了案例1和案例2(main2注释和未注释),因为案例3与此比较不同并且不相关。我希望情况2中的运行时会稍微增加,因为需要将更大的可执行文件加载到内存中,即使该程序没有在程序中使用。
所以我对三种编译器的情况1和2做了计时(每次3次):
pgf90 10.6-0 x86-64 Linux上的64位目标-tp istanbul- 64英特尔(R)Fortran英特尔(R)64编译器XE,适用于在英特尔(R)64版本12.0.2.137 Build 20110112上运行的应用程序。
$ b AMD Opteron(tm)处理器6134上的GNU Fortran(GCC)4.1.2 20080704(Red Hat 4.1.2-51)
我的脚本输出是:
exp 1 with pgf90 :
时间= 30.619秒。
时间= 30.620秒。
时间= 30.686秒。
exp 2与pgf90:
时间= 30.606秒。
时间= 30.693秒。
时间= 30.635秒。
exp 1 with ifort:
时间= 77.412秒。
时间= 77.381秒。
时间= 77.395秒。
exp 2 with ifort:
时间= 77.834秒。
时间= 77.853秒。
时间= 77.825秒。
exp 1 with gfortran:
时间= 68.713秒。
时间= 68.659秒。
时间= 68.650秒。
exp 2 with gfortran:
时间= 71.923秒。
时间= 74.857秒。
时间= 72.126秒。
请注意,case 1和case 2之间的时间差异对于gfortran是最大的,对于pgf90是最小的。 / p>
编辑:在Stefano Borini指出我忽略了只使用调用cpu_time进行循环测试的事实后,可执行的加载时间超出了等式。 AShelley提出了一个可能的原因。对于较长的运行时间,两种情况之间的差异变得最小。尽管如此 - 我观察到gfortran(见上文)的情况有显着的不同(见上文)
I observed a very curious effect in this trivial program
module Moo
contains
subroutine main()
integer :: res
real :: start, finish
integer :: i
call cpu_time(start)
do i = 1, 1000000000
call Squared(5, res)
enddo
call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start
end subroutine
subroutine Squared(v, res)
integer, intent(in) :: v
integer, intent(out) :: res
res = v*v
end subroutine
! subroutine main2()
! integer :: res
! real :: start, finish
! integer :: i
!
! call cpu_time(start)
!
! do i = 1, 1000000000
! res = v*v
! enddo
! call cpu_time(finish)
!
! print '("Time = ",f6.3," seconds.")',finish-start
! end subroutine
end module
program foo
use Moo
call main()
! call main2()
end program
Compiler is gfortran 4.6.2 on mac. If I compile with -O0
and run the program, the timing is 4.36 seconds. If I uncomment the subroutine main2()
, but not its call, the timing becomes 4.15 seconds on average. If I also uncomment the call main2()
the first timing becomes 3.80 and the second 1.86 (understandable, I have no function call).
I compared the assembler produced in the second and third cases (routine uncommented; call commented and uncommented) and they are exactly the same, save for the actual invocation of the main2 routine.
How can the code get this performance increase from a call to a routine that is going to happen in the future, and basically no difference in the resulting code?
First thing I noticed was that your program is way too short for proper benchmarking. How many runs do you use to average? What is the standard deviation? I added a nested do loop to your code to make it longer:
do i = 1, 1000000000
do j=1,10
call Squared(5, res)
enddo
enddo
I looked at only case 1 and case 2 (main2 commented and uncommented) because case 3 is different and irrelevant for this comparison. I would expect a slight increase in runtime in case 2, because of needing to load a larger executable into memory, even though that part is not used in the program.
So I did timing (3 runs each) for cases 1 and 2, for three compilers:
pgf90 10.6-0 64-bit target on x86-64 Linux -tp istanbul-64
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0.2.137 Build 20110112
GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)
on AMD Opteron(tm) Processor 6134
The output of my script is:
exp 1 with pgf90:
Time = 30.619 seconds.
Time = 30.620 seconds.
Time = 30.686 seconds.
exp 2 with pgf90:
Time = 30.606 seconds.
Time = 30.693 seconds.
Time = 30.635 seconds.
exp 1 with ifort:
Time = 77.412 seconds.
Time = 77.381 seconds.
Time = 77.395 seconds.
exp 2 with ifort:
Time = 77.834 seconds.
Time = 77.853 seconds.
Time = 77.825 seconds.
exp 1 with gfortran:
Time = 68.713 seconds.
Time = 68.659 seconds.
Time = 68.650 seconds.
exp 2 with gfortran:
Time = 71.923 seconds.
Time = 74.857 seconds.
Time = 72.126 seconds.
Notice the time difference between case 1 and case 2 is largest for gfortran, and smallest for pgf90.
EDIT: After Stefano Borini pointed out that I overlooked the fact that only the looping is being benchmarked using call to cpu_time, executable load-time is out of the equation. Answer by AShelley suggests a possible reason for this. For longer runtimes, the difference between the 2 cases becomes minimal. Still - I observe a significant difference in case of gfortran (see above)
这篇关于什么导致这个微不足道的Fortran代码的运行时差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!