什么导致这个微不足道的Fortran代码的运行时差异? [英] What causes the runtime difference in this trivial fortran code?

查看:165
本文介绍了什么导致这个微不足道的Fortran代码的运行时差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 模块Moo 
包含
子程序main ()
integer :: res
real :: start,finish
integer :: i

call cpu_time(start)

do i = 1,1000000000
call Squared(5,res)
enddo
call cpu_time(finish)

print'(Time =,f6.3, )。),完成开始
结束子程序

子程序平方(v,res)
整数,意图(in):: v
整数, intent(out):: res

res = v * v
结束子程序

!子程序main2()
! integer :: res
!真实::开始,完成
!整数:: i

!调用cpu_time(start)

!做我= 1,1000000000
! res = v * v
! enddo
!调用cpu_time(完成)

!打印'(时间=,f6.3,秒。),完成开始
! end子程序

结束模块
程序foo
使用Moo
调用main()
!调用main2()
结束程序

编译器在mac上是gfortran 4.6.2。如果我使用 -O0 进行编译并运行程序,则时间为4.36秒。如果我取消注释子程序 main2(),但不调用它,则平均时间将变为4.15秒。如果我也取消注释调用main2(),第一次计时变为3.80,第二次1.86(可以理解,我没有函数调用)。

我比较了在第二和第三种情况下生成的汇编程序(例程未注释;调用注释和未注释),它们完全相同,除了实际调用main2例程。



代码如何从调用到未来将发生的例程中获得这种性能提升,并且在结果代码中基本没有区别?



  do i = 1,1000000000 
do j = 1,10
Call Squared(5,res)
enddo
enddo

我只查看了案例1和案例2(main2注释和未注释),因为案例3与此比较不同并且不相关。我希望情况2中的运行时会稍微增加,因为需要将更大的可执行文件加载到内存中,即使该程序没有在程序中使用。



所以我对三种编译器的情况1和2做了计时(每次3次):

pgf90 10.6-0 x86-64 Linux上的64位目标-tp istanbul- 64英特尔(R)Fortran英特尔(R)64编译器XE,适用于在英特尔(R)64版本12.0.2.137 Build 20110112上运行的应用程序。

$ b AMD Opteron(tm)处理器6134上的GNU Fortran(GCC)4.1.2 20080704(Red Hat 4.1.2-51)



我的脚本输出是:

  exp 1 with pgf90 :
时间= 30.619秒。
时间= 30.620秒。
时间= 30.686秒。
exp 2与pgf90:
时间= 30.606秒。
时间= 30.693秒。
时间= 30.635秒。
exp 1 with ifort:
时间= 77.412秒。
时间= 77.381秒。
时间= 77.395秒。
exp 2 with ifort:
时间= 77.834秒。
时间= 77.853秒。
时间= 77.825秒。
exp 1 with gfortran:
时间= 68.713秒。
时间= 68.659秒。
时间= 68.650秒。
exp 2 with gfortran:
时间= 71.923秒。
时间= 74.857秒。
时间= 72.126秒。

请注意,case 1和case 2之间的时间差异对于gfortran是最大的,对于pgf90是最小的。 / p>

编辑:在Stefano Borini指出我忽略了只使用调用cpu_time进行循环测试的事实后,可执行的加载时间超出了等式。 AShelley提出了一个可能的原因。对于较长的运行时间,两种情况之间的差异变得最小。尽管如此 - 我观察到gfortran(见上文)的情况有显着的不同(见上文)

I observed a very curious effect in this trivial program

module Moo 
contains
   subroutine main()
      integer :: res 
      real :: start, finish
      integer :: i

      call cpu_time(start)

      do i = 1, 1000000000
         call Squared(5, res) 
      enddo
      call cpu_time(finish)

      print '("Time = ",f6.3," seconds.")',finish-start
   end subroutine

   subroutine Squared(v, res)
      integer, intent(in) :: v
      integer, intent(out) :: res 

      res = v*v 
   end subroutine 

!   subroutine main2()
!      integer :: res
!      real :: start, finish
!      integer :: i
!
!      call cpu_time(start)
!      
!      do i = 1, 1000000000
!         res = v*v
!      enddo
!      call cpu_time(finish)
!
!      print '("Time = ",f6.3," seconds.")',finish-start
!   end subroutine

end module
program foo 
   use Moo 
   call main()
!   call main2()
end program

Compiler is gfortran 4.6.2 on mac. If I compile with -O0 and run the program, the timing is 4.36 seconds. If I uncomment the subroutine main2(), but not its call, the timing becomes 4.15 seconds on average. If I also uncomment the call main2() the first timing becomes 3.80 and the second 1.86 (understandable, I have no function call).

I compared the assembler produced in the second and third cases (routine uncommented; call commented and uncommented) and they are exactly the same, save for the actual invocation of the main2 routine.

How can the code get this performance increase from a call to a routine that is going to happen in the future, and basically no difference in the resulting code?

解决方案

First thing I noticed was that your program is way too short for proper benchmarking. How many runs do you use to average? What is the standard deviation? I added a nested do loop to your code to make it longer:

do i = 1, 1000000000
  do j=1,10
    call Squared(5, res) 
  enddo
enddo

I looked at only case 1 and case 2 (main2 commented and uncommented) because case 3 is different and irrelevant for this comparison. I would expect a slight increase in runtime in case 2, because of needing to load a larger executable into memory, even though that part is not used in the program.

So I did timing (3 runs each) for cases 1 and 2, for three compilers:

pgf90 10.6-0 64-bit target on x86-64 Linux -tp istanbul-64

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0.2.137 Build 20110112

GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)

on AMD Opteron(tm) Processor 6134

The output of my script is:

exp 1 with pgf90:
Time = 30.619 seconds.
Time = 30.620 seconds.
Time = 30.686 seconds.
exp 2 with pgf90:
Time = 30.606 seconds.
Time = 30.693 seconds.
Time = 30.635 seconds.
exp 1 with ifort:
Time = 77.412 seconds.
Time = 77.381 seconds.
Time = 77.395 seconds.
exp 2 with ifort:
Time = 77.834 seconds.
Time = 77.853 seconds.
Time = 77.825 seconds.
exp 1 with gfortran:
Time = 68.713 seconds.
Time = 68.659 seconds.
Time = 68.650 seconds.
exp 2 with gfortran:
Time = 71.923 seconds.
Time = 74.857 seconds.
Time = 72.126 seconds.

Notice the time difference between case 1 and case 2 is largest for gfortran, and smallest for pgf90.

EDIT: After Stefano Borini pointed out that I overlooked the fact that only the looping is being benchmarked using call to cpu_time, executable load-time is out of the equation. Answer by AShelley suggests a possible reason for this. For longer runtimes, the difference between the 2 cases becomes minimal. Still - I observe a significant difference in case of gfortran (see above)

这篇关于什么导致这个微不足道的Fortran代码的运行时差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆