Fortran内在定时例程,哪个更好? cpu_time或system_clock [英] Fortran intrinsic timing routines, which is better? cpu_time or system_clock

查看:1584
本文介绍了Fortran内在定时例程,哪个更好? cpu_time或system_clock的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在计算FORTRAN程序时,我通常只使用命令调用cpu_time(t)

然后我偶然发现调用system_clock([count,count_rate,count_max])这似乎是做同样的事情。然而,在一个更困难的庄园。
我对这些知识的了解来自:旧英特尔文档
我无法在英特尔主页上找到它。


  1. 哪一个更准确,还是它们相似?

  2. 其中一个计算缓存未命中(或其他类型),另一个没有,或者做其中的任何一个?

  3. 或者唯一的区别在于我的标记中的标记事项? / li>

这些是我的问题,下面我提供了一个代码供您查看一些时间和用法。他们告诉我他们在输出方面非常相似,因此在执行时似乎很相似。

我应该注意到,我可能会始终坚持使用 cpu_time ,并且我并不需要更精确的计时。



在下面的代码中,我试图对它们进行比较。 (我也尝试过更复杂的事情,但不会提供以保持简洁)
所以基本上我的结果是:


  • cpu_time


    1. 更容易使用,您不需要初始化调用

    2. 不同的直接时间

    3. 还应该是编译器特定的,但无法查看精度。 (标准为毫秒)
    4. 是线程时间的总和。即不推荐用于并行运行。


      1. 需要预初始化。

      2. 后期处理,采用分水岭的形式。 (小事情,但仍然有区别)

      3. 是编译器特定的。在我的电脑上发现了以下内容:

        • Intel 12.0.4 由于 INTEGER,计数率为10000

        • gcc-4.4.5 使用1000,不知道如何区分


      4. 很容易遇到环绕,即如果 c1>由于 count_max

      5. 是来自一个标准时间的时间,因此c2 因此,这将产生一个线程的实际时间而不是总和。




代码:

 程序计时器
IMPLICIT NONE
REAL :: t1,t2,rate
INTEGER :: c1,c2,cr,cm,i,j,n,s
INTEGER,PARAMETER :: x = 20000,y = 15000,runs = 1000
REAL :: array (x,y),a_diff,diff

!首先初始化system_clock
CALL system_clock(count_rate = cr)
CALL system_clock(count_max = cm)
rate = REAL(cr)
WRITE(*,*)system_clock rate ,费率

diff = 0.0
a_diff = 0.0
s = 0
DO n = 1,运行
CALL CPU_TIME(t1)
CALL SYSTEM_CLOCK(c1)
FORALL(i = 1:x,j = 1:y)
array(i,j)= REAL(i)* REAL(j)+ 2
END FORALL
CALL CPU_TIME(t2)
CALL SYSTEM_CLOCK(c2)
array(1,1)= array(1,2)
IF((c2-c1)/ rate< (t2-t1))s = s + 1
diff =(c2-c1)/ rate - (t2-t1)+ diff
a_diff = ABS((c2 - c1)/ rate - (t2 -t1))+ a_diff
END DO

WRITE(*,*)system_clock:,(c2 - c1)/ rate
WRITE(*,*)cpu_time :,(t2-t1)
WRITE(*,*)sc WRITE(*,*)mean diff:,diff / runs
WRITE(*,*)abs mean diff:,a_diff / runs
END PROGRAM计时器



o完成我在这里给我的英特尔12.0.4和gcc-4.4.5编译器输出。




  • Intel 12.0.4 with -O0

      system_clock速率10000.00 
    system_clock:2.389600
    cpu_time:2.384033
    sc< ct:1 of 1000
    mean diff:4.2409324E-03
    abs平均值差异:4.2409897E-03

    真实42m5.340s
    用户41m48.869s
    sys 0m12.233s


  • gcc-4.4.5 with -O0

      system_clock rate 1000.0000 
    system_clock:1.1849999
    cpu_time:1.1840820
    sc< ct:1000
    中的275个差异:2.05709646E-03
    abs平均值差异:2.71424348E-03

    实际19m45.351s
    用户19m42.954s
    sys 0m0.348s




感谢阅读...

解决方案

这两个内部函数报告不同类型的时间。 system_clock报告挂墙时间或经过的时间。 cpu_time报告CPU使用的时间。在多任务机器上,这些可能会非常不同,例如,如果您的进程与其他三个进程共享CPU并因此接收到25%的CPU并使用了10个CPU时间,则实际需要大约40秒的时间或墙时钟时间。

When timing a FORTRAN program i usually just use the command call cpu_time(t).
Then i stumbled across call system_clock([count,count_rate,count_max]) which seems to do the same thing. However, in a more difficult manor. My knowledge of these come from: Old Intel documentation.
I wasn't able to find it on Intel's homepage. See my markup below.

  1. Which is the more accurate, or are they similar?
  2. Do one of them count cache misses (or other of the sorts) and the other not, or do any of them?
  3. Or is the only difference being the marked thing in my markup below?

Those are my questions, below i have supplied a code for you to see some timings and usages. They have showed me that they are very similar in output and thus seem to be similar in implementation.
I should note that i will probably always stick with cpu_time, and that i don't really need more precise timings.

In the below code i have tried to compare them. (i have also tried more elaborate things, but will not supply in order to keep brevity) So basically my result is that:

  • cpu_time

    1. Is easier to use, you don't need the initialization calls
    2. Direct time in a difference
    3. Should also be compiler specific, but there is no way to see the precision. (the norm is milliseconds)
    4. Is sum of thread time. I.e. not recommended for parallel runs.

  • system_clock

    1. Needs pre-initialization.
    2. After-process, in form of a divide. (small thing, but nonetheless a difference)
    3. Is compiler specific. On my PC the following was found:
      • Intel 12.0.4 uses a count rate of 10000, due to the INTEGER precision.
      • gcc-4.4.5 uses 1000, do not know how this differentiates
    4. Is prone to encounter wraparounds, i.e. if c1 > c2, due to count_max
    5. Is time from one standard time. Thus this will yield the actual time of one thread and not the sum.

Code:

PROGRAM timer
  IMPLICIT NONE
  REAL :: t1,t2,rate 
  INTEGER :: c1,c2,cr,cm,i,j,n,s
  INTEGER , PARAMETER :: x=20000,y=15000,runs=1000
  REAL :: array(x,y),a_diff,diff

  ! First initialize the system_clock
  CALL system_clock(count_rate=cr)
  CALL system_clock(count_max=cm)
  rate = REAL(cr)
  WRITE(*,*) "system_clock rate ",rate

  diff = 0.0
  a_diff = 0.0
  s = 0
  DO n = 1 , runs
     CALL CPU_TIME(t1)
     CALL SYSTEM_CLOCK(c1)
     FORALL(i = 1:x,j = 1:y)
        array(i,j) = REAL(i)*REAL(j) + 2
     END FORALL
     CALL CPU_TIME(t2)
     CALL SYSTEM_CLOCK(c2)
     array(1,1) = array(1,2)     
     IF ( (c2 - c1)/rate < (t2-t1) ) s = s + 1
     diff = (c2 - c1)/rate - (t2-t1) + diff
     a_diff = ABS((c2 - c1)/rate - (t2-t1)) + a_diff
  END DO

  WRITE(*,*) "system_clock : ",(c2 - c1)/rate
  WRITE(*,*) "cpu_time     : ",(t2-t1)
  WRITE(*,*) "sc < ct      : ",s,"of",runs
  WRITE(*,*) "mean diff    : ",diff/runs
  WRITE(*,*) "abs mean diff: ",a_diff/runs
END PROGRAM timer

To complete i here give the output from my Intel 12.0.4 and gcc-4.4.5 compiler.

  • Intel 12.0.4 with -O0

    system_clock rate    10000.00    
    system_clock :    2.389600    
    cpu_time     :    2.384033    
    sc < ct      :            1 of        1000
    mean diff    :   4.2409324E-03
    abs mean diff:   4.2409897E-03
    
    real    42m5.340s
    user    41m48.869s
    sys 0m12.233s
    

  • gcc-4.4.5 with -O0

    system_clock rate    1000.0000    
    system_clock :    1.1849999    
    cpu_time     :    1.1840820    
    sc < ct      :          275 of        1000  
    mean diff    :   2.05709646E-03  
    abs mean diff:   2.71424348E-03  
    
    real    19m45.351s  
    user    19m42.954s  
    sys 0m0.348s  
    

Thanks for reading...

解决方案

These two intrinsics report different types of time. system_clock reports "wall time" or elapsed time. cpu_time reports time used by the CPU. On a multi-tasking machine these could be very different, e.g., if your process shared the CPU equally with three other processes and therefore received 25% of the CPU and used 10 cpu seconds, it would take about 40 seconds of actual elapsed or wall clock time.

这篇关于Fortran内在定时例程,哪个更好? cpu_time或system_clock的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆