为什么带有数组作为输入的子例程比具有自动本地数组的相同子例程具有更快的性能? [英] Why does a subroutine with an array as an input give faster performance than the same subroutine with an automatic local array?

查看:63
本文介绍了为什么带有数组作为输入的子例程比具有自动本地数组的相同子例程具有更快的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在重新编写一些旧代码以提高可读性,并希望使其更易于维护.

I am re-writing some legacy code to improve readability and hopefully make it easier to maintain.

我试图减少子程序的输入参数数量,但是我发现 subroutine sub(N, ID)-> subroutine sub(N) 明显降低了性能.

I am trying to decrease the number of input parameters for the subroutines, but I found that changing subroutine sub(N, ID) --> subroutine sub(N) had noticeably reduced performance.

ID仅在sub中使用,因此我认为将其用作输入没有道理. 是否可以在不影响性能的情况下使用sub(N)? (对于我来说,N <10,性能差5-10倍.)

ID is only used in sub, so I don't believe it makes sense to have it as an input. Is it possible to use sub(N) without taking the performance hit? (For my uses, N < 10, where the performance is 5-10x worse.)

性能比较:

  1. sub_1

  • N = 4,0.9秒
  • N = 20,1.0秒
  • N = 200,2.1秒
  • N = 4, 0.9 seconds
  • N = 20, 1.0 seconds
  • N = 200, 2.1 seconds

sub_2

  • N = 4,0.07秒
  • N = 20,0.18秒
  • N = 200,1.3秒
  • N = 4, 0.07 seconds
  • N = 20, 0.18 seconds
  • N = 200, 1.3 seconds

我将Mac OS 10.14.6与gfortran 5.2.0结合使用

I am using Mac OS 10.14.6 with gfortran 5.2.0

program test
  integer, parameter  :: N = 1
  real, dimension(N)  :: ID


  call CPU_time(t1)

  do i = 1, 10000000
    CALL sub_1(N)
  end do

  call CPU_time(t2)
  write ( *, * ) 'Elapsed real time =', t2 - t1



  call CPU_time(t1)

  do i = 1, 10000000
    CALL sub_2(N, ID)
  end do

  call CPU_time(t2)
  write ( *, * ) 'Elapsed real time =', t2 - t1

end program test



SUBROUTINE sub_1(N)
  integer,            intent(in)      :: N
  real, dimension(N)                  :: ID

  ID = 0.0

END SUBROUTINE sub_1



SUBROUTINE sub_2(N, ID)
  integer,            intent(in)      :: N
  real, dimension(N), intent(in out)  :: ID

  ID = 0.0

END SUBROUTINE sub_2

推荐答案

这似乎是一个功能";您正在使用的gfortran的旧版本.如果我至少在N = 10时使用更高的版本,那么时间的可比性要大得多:

This seems to be a "feature" of the old version of gfortran you are using. If I use later versions at least for N=10 the times are much more comparable:

ian@eris:~/work/stack$ head s.f90
program test
  integer, parameter  :: N = 10
  real, dimension(N)  :: ID


  call CPU_time(t1)

  do i = 1, 10000000
    CALL sub_1(N)
  end do
ian@eris:~/work/stack$ gfortran-5 --version
GNU Fortran (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
Copyright (C) 2015 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

ian@eris:~/work/stack$ gfortran-5 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =  0.149489999    
 Elapsed real time =   1.99675560E-06
ian@eris:~/work/stack$ gfortran-6 --version
GNU Fortran (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-6 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   7.00005330E-06
 Elapsed real time =   5.00003807E-06
ian@eris:~/work/stack$ gfortran-7 --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-7 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   8.00006092E-06
 Elapsed real time =   6.00004569E-06
ian@eris:~/work/stack$ gfortran-8 --version
GNU Fortran (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-8 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   9.00030136E-06
 Elapsed real time =   6.00004569E-06

但是我会把上面所有的盐都装满一桶盐.优化器很可能已经得出结论,在这种简单情况下,实际上不需要执行任何操作,因此它摆脱了您想计时的所有操作-唯一可以真正告诉您的基准是您要运行的代码.

However I would take all the above with a bucket-full of salt. It is more than likely the optimiser has worked out that it doesn't actually need to do anything in this simple case and so has just got rid of all the operations you want to time - the only benchmark that can actually tell you about this is the code you want to run.

这篇关于为什么带有数组作为输入的子例程比具有自动本地数组的相同子例程具有更快的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆