ifort和gfortran之间的性能差异令人费解 [英] Puzzling performance difference between ifort and gfortran

查看:1272
本文介绍了ifort和gfortran之间的性能差异令人费解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我读了一篇发布堆栈溢出关于找到完美正方形的整数。正如我想玩这个,我写了下面的小程序:

 程序PERFECT_SQUARE 
IMPLICIT NONE
INTEGER * 8 :: N,M,NTOT
LOGICAL :: IS_SQUARE

N = Z'D0B03602181'
WRITE(*,*)IS_SQUARE(N)

NTOT = 0
DO N = 1,1000000000
IF(IS_SQUARE(N))THEN
NTOT = NTOT + 1
END IF
END DO
WRITE(*,*)NTOT!应找到31622个方格
END PROGRAM

LOGICAL FUNCTION IS_SQUARE(N)
IMPLICIT NONE
INTEGER * 8 :: N,M

!检查是否为负
IF(N.LT.0)THEN
IS_SQUARE = .FALSE。
RETURN
END IF

!检查结束4位是否属于(0,1,4,9)
M = IAND(N,15)
IF(.NOT。(M.EQ.0 .OR。M.EQ. 1 .OR。M.EQ.4 .OR。M.EQ.9))THEN
IS_SQUARE = .FALSE。
RETURN
END IF

!尝试找到最接近的整数sqrt(n)
M = DINT(SQRT(DBLE(N)))
IF(M ** 2.NE.N)THEN
IS_SQUARE =。假。
RETURN
END IF

IS_SQUARE = .TRUE。
RETURN
END FUNCTION

编译时使用 gfortran -O2 ,运行时间为4.437秒,-O3为2.657秒。然后我认为使用 ifort -O2 编译可能会更快,因为它可能具有更快的 SQRT 函数,但它转向运行时间现在为9.026秒,并且 ifort -O3 >相同。我尝试使用Valgrind进行分析,而英特尔编译的程序确实使用了更多指令。



我的问题是为什么?有没有办法找出差异来自哪里?



EDITS:




  • gfortran版本4.6.2和ifort版本12.0.2

  • 次是从运行时间./a.out 和是真实/用户时间(sys总是几乎为0)

  • 这是在Linux x86_64上,gfortran和ifort都是64位版本

  • ifort内联了所有内容,gfortran仅在-O3处,但后者的汇编代码比ifort更简单,它使用xmm注册了很多代码,添加了 NTOT = 0 在循环之前,应该修复其他gfortran版本的问题



当复杂的 IF 语句被删除,gfortran需要大约4倍的时间(10-11秒)。这是预料之中的,因为该声明大约会抛出大约75%的数字,避免对它们执行 SQRT 。另一方面,只能使用更多的时间。我的猜测是,当ifort尝试优化 IF 语句时出现错误。



EDIT2:



我尝试了ifort版本12.1.2.273,它的速度更快,所以看起来像他们修正了那样。

你是用什么编译器版本?
有趣的是,它看起来像是从11.1到12.0的性能回归 - 例如对我来说,11.1(ifort -fast square.f90)需要3.96s,12.0(相同的选项)需要13.3s。
gfortran(4.6.1)(-O3)更快(3.35s)。
我之前看到过这种回归,尽管没有那么戏剧化。
顺便说一句,用

替换if语句is_square = any(m == [0,1,4,9 ])
如果(.not。is_square)返回

ifort 12.0,但在gfortran和ifort 11.1中较慢。



看起来问题的一部分是12.0在尝试矢量化方面过于积极:添加

 !DEC $ NOVECTOR 





此外,作为一个副作用:如果你有一个多核CPU,尝试添加-parallel到ifort命令行:)


Recently, I read a post on Stack Overflow about finding integers that are perfect squares. As I wanted to play with this, I wrote the following small program:

PROGRAM PERFECT_SQUARE
IMPLICIT NONE
INTEGER*8 :: N, M, NTOT
LOGICAL :: IS_SQUARE

N=Z'D0B03602181'
WRITE(*,*) IS_SQUARE(N)

NTOT=0
DO N=1,1000000000
  IF (IS_SQUARE(N)) THEN
    NTOT=NTOT+1
  END IF
END DO
WRITE(*,*) NTOT ! should find 31622 squares
END PROGRAM

LOGICAL FUNCTION IS_SQUARE(N)
IMPLICIT NONE
INTEGER*8 :: N, M

! check if negative
IF (N.LT.0) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

! check if ending 4 bits belong to (0,1,4,9)
M=IAND(N,15)
IF (.NOT.(M.EQ.0 .OR. M.EQ.1 .OR. M.EQ.4 .OR. M.EQ.9)) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

! try to find the nearest integer to sqrt(n)
M=DINT(SQRT(DBLE(N)))
IF (M**2.NE.N) THEN
  IS_SQUARE=.FALSE.
  RETURN
END IF

IS_SQUARE=.TRUE.
RETURN
END FUNCTION

When compiling with gfortran -O2, running time is 4.437 seconds, with -O3 it is 2.657 seconds. Then I thought that compiling with ifort -O2 could be faster since it might have a faster SQRT function, but it turned out running time was now 9.026 seconds, and with ifort -O3 the same. I tried to analyze it using Valgrind, and the Intel compiled program indeed uses many more instructions.

My question is why? Is there a way to find out where exactly the difference comes from?

EDITS:

  • gfortran version 4.6.2 and ifort version 12.0.2
  • times are obtained from running time ./a.out and is the real/user time (sys was always almost 0)
  • this is on Linux x86_64, both gfortran and ifort are 64-bit builds
  • ifort inlines everything, gfortran only at -O3, but the latter assembly code is simpler than that of ifort, which uses xmm registers a lot
  • fixed line of code, added NTOT=0 before loop, should fix issue with other gfortran versions

When the complex IF statement is removed, gfortran takes about 4 times as much time (10-11 seconds). This is to be expected since the statement approximately throws out about 75% of the numbers, avoiding to do the SQRT on them. On the other hand, ifort only uses slightly more time. My guess is that something goes wrong when ifort tries to optimize the IF statement.

EDIT2:

I tried with ifort version 12.1.2.273 it's much faster, so looks like they fixed that.

解决方案

What compiler versions are you using? Interestingly, it looks like a case where there is a performance regression from 11.1 to 12.0 -- e.g. for me, 11.1 (ifort -fast square.f90) takes 3.96s, and 12.0 (same options) took 13.3s. gfortran (4.6.1) (-O3) is still faster (3.35s). I have seen this kind of a regression before, although not quite as dramatic. BTW, replacing the if statement with

is_square = any(m == [0, 1, 4, 9])
if(.not. is_square) return

makes it run twice as fast with ifort 12.0, but slower in gfortran and ifort 11.1.

It looks like part of the problem is that 12.0 is overly aggressive in trying to vectorize things: adding

!DEC$ NOVECTOR

right before the DO loop (without changing anything else in the code) cuts the run time down to 4.0 sec.

Also, as a side benefit: if you have a multi-core CPU, try adding -parallel to the ifort command line :)

这篇关于ifort和gfortran之间的性能差异令人费解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆