什么导致这个微不足道的Fortran代码的运行时差异？ [英] What causes the runtime difference in this trivial fortran code?

查看：165 发布时间：2018/3/16 17:28:31 performance fortran

本文介绍了什么导致这个微不足道的Fortran代码的运行时差异？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 模块Moo 
包含
子程序main （）
 integer :: res 
 real :: start，finish 
 integer :: i 
 
 call cpu_time（start）
 
 do i = 1，1000000000 
 call Squared（5，res）
 enddo 
 call cpu_time（finish）
 
 print'（Time =，f6.3， ）。），完成开始
结束子程序
 
子程序平方（v，res）
整数，意图（in）:: v 
整数， intent（out）:: res 
 
 res = v * v 
结束子程序
 
！子程序main2（）
！ integer :: res 
！真实::开始，完成
！整数:: i 
！ 
！调用cpu_time（start）
！ 
！做我= 1，1000000000 
！ res = v * v 
！ enddo 
！调用cpu_time（完成）
！ 
！打印'（时间=，f6.3，秒。），完成开始
！ end子程序
 
结束模块
程序foo 
使用Moo 
调用main（）
！调用main2（）
结束程序

编译器在mac上是gfortran 4.6.2。如果我使用 -O0 进行编译并运行程序，则时间为4.36秒。如果我取消注释子程序 main2（），但不调用它，则平均时间将变为4.15秒。如果我也取消注释调用main2（），第一次计时变为3.80，第二次1.86（可以理解，我没有函数调用）。

我比较了在第二和第三种情况下生成的汇编程序（例程未注释;调用注释和未注释），它们完全相同，除了实际调用main2例程。

代码如何从调用到未来将发生的例程中获得这种性能提升，并且在结果代码中基本没有区别？

do i = 1，1000000000 do j = 1,10 Call Squared（5，res） enddo enddo
我只查看了案例1和案例2（main2注释和未注释），因为案例3与此比较不同并且不相关。我希望情况2中的运行时会稍微增加，因为需要将更大的可执行文件加载到内存中，即使该程序没有在程序中使用。

所以我对三种编译器的情况1和2做了计时（每次3次）：

pgf90 10.6-0 x86-64 Linux上的64位目标-tp istanbul- 64英特尔（R）Fortran英特尔（R）64编译器XE，适用于在英特尔（R）64版本12.0.2.137 Build 20110112上运行的应用程序。

$ b AMD Opteron（tm）处理器6134上的GNU Fortran（GCC）4.1.2 20080704（Red Hat 4.1.2-51）

我的脚本输出是：

  exp 1 with pgf90 ：
时间= 30.619秒。 
时间= 30.620秒。 
时间= 30.686秒。 
 exp 2与pgf90：
时间= 30.606秒。 
时间= 30.693秒。 
时间= 30.635秒。 
 exp 1 with ifort：
时间= 77.412秒。 
时间= 77.381秒。 
时间= 77.395秒。 
 exp 2 with ifort：
时间= 77.834秒。 
时间= 77.853秒。 
时间= 77.825秒。 
 exp 1 with gfortran：
时间= 68.713秒。 
时间= 68.659秒。 
时间= 68.650秒。 
 exp 2 with gfortran：
时间= 71.923秒。 
时间= 74.857秒。 
时间= 72.126秒。

请注意，case 1和case 2之间的时间差异对于gfortran是最大的，对于pgf90是最小的。 / p>

编辑：在Stefano Borini指出我忽略了只使用调用cpu_time进行循环测试的事实后，可执行的加载时间超出了等式。 AShelley提出了一个可能的原因。对于较长的运行时间，两种情况之间的差异变得最小。尽管如此 - 我观察到gfortran（见上文）的情况有显着的不同（见上文）

I observed a very curious effect in this trivial program

module Moo 
contains
   subroutine main()
      integer :: res 
      real :: start, finish
      integer :: i

      call cpu_time(start)

      do i = 1, 1000000000
         call Squared(5, res) 
      enddo
      call cpu_time(finish)

      print '("Time = ",f6.3," seconds.")',finish-start
   end subroutine

   subroutine Squared(v, res)
      integer, intent(in) :: v
      integer, intent(out) :: res 

      res = v*v 
   end subroutine 

!   subroutine main2()
!      integer :: res
!      real :: start, finish
!      integer :: i
!
!      call cpu_time(start)
!      
!      do i = 1, 1000000000
!         res = v*v
!      enddo
!      call cpu_time(finish)
!
!      print '("Time = ",f6.3," seconds.")',finish-start
!   end subroutine

end module
program foo 
   use Moo 
   call main()
!   call main2()
end program

Compiler is gfortran 4.6.2 on mac. If I compile with -O0 and run the program, the timing is 4.36 seconds. If I uncomment the subroutine main2(), but not its call, the timing becomes 4.15 seconds on average. If I also uncomment the call main2() the first timing becomes 3.80 and the second 1.86 (understandable, I have no function call).

I compared the assembler produced in the second and third cases (routine uncommented; call commented and uncommented) and they are exactly the same, save for the actual invocation of the main2 routine.

How can the code get this performance increase from a call to a routine that is going to happen in the future, and basically no difference in the resulting code?

解决方案

First thing I noticed was that your program is way too short for proper benchmarking. How many runs do you use to average? What is the standard deviation? I added a nested do loop to your code to make it longer:

do i = 1, 1000000000
  do j=1,10
    call Squared(5, res) 
  enddo
enddo

I looked at only case 1 and case 2 (main2 commented and uncommented) because case 3 is different and irrelevant for this comparison. I would expect a slight increase in runtime in case 2, because of needing to load a larger executable into memory, even though that part is not used in the program.

So I did timing (3 runs each) for cases 1 and 2, for three compilers:

pgf90 10.6-0 64-bit target on x86-64 Linux -tp istanbul-64

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0.2.137 Build 20110112

GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)

on AMD Opteron(tm) Processor 6134

The output of my script is:

exp 1 with pgf90:
Time = 30.619 seconds.
Time = 30.620 seconds.
Time = 30.686 seconds.
exp 2 with pgf90:
Time = 30.606 seconds.
Time = 30.693 seconds.
Time = 30.635 seconds.
exp 1 with ifort:
Time = 77.412 seconds.
Time = 77.381 seconds.
Time = 77.395 seconds.
exp 2 with ifort:
Time = 77.834 seconds.
Time = 77.853 seconds.
Time = 77.825 seconds.
exp 1 with gfortran:
Time = 68.713 seconds.
Time = 68.659 seconds.
Time = 68.650 seconds.
exp 2 with gfortran:
Time = 71.923 seconds.
Time = 74.857 seconds.
Time = 72.126 seconds.

Notice the time difference between case 1 and case 2 is largest for gfortran, and smallest for pgf90.

EDIT: After Stefano Borini pointed out that I overlooked the fact that only the looping is being benchmarked using call to cpu_time, executable load-time is out of the equation. Answer by AShelley suggests a possible reason for this. For longer runtimes, the difference between the 2 cases becomes minimal. Still - I observe a significant difference in case of gfortran (see above)

这篇关于什么导致这个微不足道的Fortran代码的运行时差异？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

什么导致这个微不足道的Fortran代码的运行时差异？ [英] What causes the runtime difference in this trivial fortran code?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

什么导致这个微不足道的Fortran代码的运行时差异？ [英] What causes the runtime difference in this trivial fortran code?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭