循环矢量化给出了不同的答案 [英] Loop vectorization gives different answer

查看:141
本文介绍了循环矢量化给出了不同的答案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一些单元测试,并发现我的代码在向量化时给出了稍微的不同结果。在下面的示例中,数组 a 在一个维中求和并添加到初始值 x 中。 a 的大多数元素太小而不能更改 x 。代码是:

  module datamod 
use ISO_FORTRAN_ENV,only:dp => REAL64
隐式无

! - 数组维度足够大,gfortran可以矢量化
integer,parameter :: N = 6
integer,parameter :: M = 10
real(dp):: x(N),一个(N,M)

包含
子例程init_ax
! - 设置a和x使得问题显示

x = 0.
x(1)= 0.1e + 03_dp

a = 0.
! - 每个负面组件太小,无法单独更改x(1)
! - 但是,积极部分足够大
a(1,1)= -0.4e-14_dp
a(1,2)= -0.4e-14_dp
a(1,3)= -0.4e-14_dp
a(1,4)= 0.8e-14_dp
a(1,5)= -0.4e-14_dp
结束子程序init_ax
结束模块数据模块

程序main
仅使用datamod:a,x,N,M,init_ax
隐式无
integer :: i,j

调用init_ax

! - 所讨论的循环
do i = 1,N
do j = 1,M
x(i)= x(i)+ a(i,j)
enddo
enddo
$ b $ write(*,'(a,e26.18)')'x(1)is:',x(1)
end program main

该代码在gfortran without和循环矢量化中给出以下结果。请注意, ftree-vectorize 包含在 -O3 中,所以当使用时会出现问题 - O3 也是。

  mach5%gfortran -O2 main.f90&& ./a.out 
x(1)是:0.100000000000000014E + 03
mach5%gfortran -O2 -ftree-vectorize main.f90&& ./a.out
x(1)是:0.999999999999999858E + 02

我知道某些编译器选项可以更改答案,例如 -fassociative-math 。然而,根据gcc -O3 优化包中没有包含这些内容。 Optimize-Options.htmlrel =nofollow noreferrer>优化选项页面。



在我看来,好像矢量化代码将所有组件首先 a ,然后添加到x。然而,这是不正确的,因为编写的代码需要将 a 的每个组件添加到 x 中。



这里发生了什么?可能循环矢量化在某些情况下改变了答案? Gfortran版本4.7和5.3也有同样的问题,但是Intel 16.0和PGI 15.10没有。

我复制了您提供的代码(到一个名为test.f90的文件),然后使用gfortran的4.8.5版本进行编译和运行。我发现 -O2 -O2 -ftree-vectorize 选项的结果不同,正如结果不同一样。然而,当我简单地使用 -O3 时,我发现结果匹配 -O2

  $ gfortran --version 
GNU Fortran(GCC)4.8.5 20150623(Red Hat 4.8.5-11)
版权(C)2015 Free Software Foundation,Inc.

在法律允许的范围内,GNU Fortran不附带任何担保。
您可以根据GNU通用公共许可条款重新分配GNU Fortran
的副本。
有关这些问题的更多信息,请参阅名为COPYING

$ gfortran -O2 test.f90&& ./a.out
x(1)是:0.100000000000000014E + 03
$ gfortran -O2 -ftree-vectorize test.f90&& ./a.out
x(1)是:0.999999999999999858E + 02
$ gfortran -O3 test.f90&& ./a.out
x(1)是:0.100000000000000014E + 03


I am building some unit tests and find that my code gives a slightly different result when vectorized. In my example case below, an array a is summed in one dimension and added to an initial value x. Most elements of a are too small to change x. The code is:

module datamod
   use ISO_FORTRAN_ENV, only : dp => REAL64
   implicit none

   ! -- Array dimensions are large enough for gfortran to vectorize
   integer, parameter :: N = 6
   integer, parameter :: M = 10
   real(dp) :: x(N), a(N,M)

contains
subroutine init_ax
   ! -- Set a and x so the issue manifests

   x = 0.
   x(1) =  0.1e+03_dp

   a = 0.
   ! -- Each negative component is too small to individually change x(1)
   ! -- But the positive component is just big enough
   a(   1,   1) =  -0.4e-14_dp
   a(   1,   2) =  -0.4e-14_dp
   a(   1,   3) =  -0.4e-14_dp
   a(   1,   4) =   0.8e-14_dp
   a(   1,   5) =  -0.4e-14_dp
end subroutine init_ax
end module datamod

program main
   use datamod, only : a, x, N, M, init_ax
   implicit none
   integer :: i, j

   call init_ax

   ! -- The loop in question
   do i=1,N
      do j=1,M
         x(i) = x(i) + a(i,j)
      enddo
   enddo

   write(*,'(a,e26.18)') 'x(1) is: ', x(1)
end program main

The code gives the following results in gfortran without and with loop vectorization. Note that ftree-vectorize is included in -O3, so the problem manifests when using -O3 also.

mach5% gfortran -O2 main.f90 && ./a.out
x(1) is:   0.100000000000000014E+03
mach5% gfortran -O2 -ftree-vectorize main.f90 && ./a.out
x(1) is:   0.999999999999999858E+02

I know that certain compiler options can change the answer, such as -fassociative-math. However, none of those are included in the standard -O3 optimization package according to the gcc optimization options page.

It seems to me as though the vectorized code is adding up all components of a first, and then adding to x. However, this is incorrect because the code as written requires each component of a to be added to x.

What is going on here? May loop vectorization change the answer in some cases? Gfortran versions 4.7 and 5.3 had the same problem, but Intel 16.0 and PGI 15.10 did not.

解决方案

I copied the code you provided (to a file called test.f90) and then I compiled and ran it using version 4.8.5 of gfortran. I found that results from the -O2 and -O2 -ftree-vectorize options differ just as your results differ. However, when I simply used -O3, I found that the results matched -O2.

$ gfortran --version
GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

$ gfortran -O2 test.f90 && ./a.out
x(1) is:   0.100000000000000014E+03
$ gfortran -O2 -ftree-vectorize test.f90 && ./a.out
x(1) is:   0.999999999999999858E+02
$ gfortran -O3 test.f90 && ./a.out
x(1) is:   0.100000000000000014E+03

这篇关于循环矢量化给出了不同的答案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆