sum函数返回的答案与显式循环不同 [英] The sum function returns answers different from an explicit loop

查看:72
本文介绍了sum函数返回的答案与显式循环不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将f77代码转换为f90代码,并且部分代码需要对3d矩阵的元素求和.在f77中,这是通过使用3个循环(在外部,中间,内部索引上)完成的.我决定使用f90的内在和(3次)来完成此操作,令我惊讶的是答案有所不同.我正在使用ifort编译器,已调试,检查界限,未打开所有优化

I am converting f77 code to f90 code, and part of the code needs to sum over elements of a 3d matrix. In f77 this was accomplished by using 3 loops (over outer,middle,inner indices). I decided to use the f90 intrinsic sum (3 times) to accomplish this, and much to my surprise the answers differ. I am using the ifort compiler, have debugging, check-bounds, no optimization all turned on

这是f77风格的代码

r1 = 0.0
do k=1,nz
  do j=1,ny
    do i=1,nx
      r1 = r1 + foo(i,j,k)
    end do
  end do
end do

这是f90代码

r = SUM(SUM(SUM(foo, DIM=3), DIM=2), DIM=1)

我尝试了各种变体,例如将循环的顺序换为f77代码,或者创建临时的2D矩阵和1D数组以在使用SUM时缩小"尺寸,但是显式的f77样式循环始终可以f90 + SUM函数提供了不同的答案.

I have tried all sorts of variations, such as swapping the order of the loops for the f77 code, or creating temporary 2D matrices and 1D arrays to "reduce" the dimensions while using SUM, but the explicit f77 style loops always give different answers from the f90+ SUM function.

对于能帮助您理解差异的任何建议,我将不胜感激.

I'd appreciate any suggestions that help understand the discrepancy.

通过这种方式使用一个串行处理器.

By the way this is using one serial processor.

编辑12:13 pm以显示完整的示例

Edited 12:13 pm to show complete example

! ifort -check bounds -extend-source 132 -g -traceback -debug inline-debug-info -mkl -o verify  verify.f90
! ./verify

program verify

implicit none

integer :: nx,ny,nz

parameter(nx=131,ny=131,nz=131)

integer :: i,j,k
real :: foo(nx,ny,nz)
real :: r0,r1,r2
real :: s0,s1,s2
real :: r2Dfooxy(nx,ny),r1Dfoox(nx)

call random_seed
call random_number(foo)

r0 = 0.0
do k=1,nz
  do j=1,ny
    do i=1,nx
      r0 = r0 + foo(i,j,k)
    end do
  end do
end do

r1 = 0.0
do i=1,nx
  do j=1,ny
    do k=1,nz
      r1 = r1 + foo(i,j,k)
    end do
  end do
end do

r2 = 0.0
do j=1,ny
  do i=1,nx
    do k=1,nz
      r2 = r2 + foo(i,j,k)
    end do
  end do
end do

!*************************

s0 = 0.0
s0 = SUM(SUM(SUM(foo, DIM=3), DIM=2), DIM=1)

s1 = 0.0
r2Dfooxy = SUM(foo,   DIM = 3)
r1Dfoox  = SUM(r2Dfooxy, DIM = 2)
s1 = SUM(r1Dfoox)

s2 = SUM(foo)

!*************************

print *,'nx,ny,nz = ',nx,ny,nz
print *,'size(foo) = ',size(foo)

write(*,'(A,4(ES15.8))') 'r0,r1,r2          = ',r0,r1,r2
write(*,'(A,3(ES15.8))') 'r0-r1,r0-r2,r1-r2 = ',r0-r1,r0-r2,r1-r2

write(*,'(A,4(ES15.8))') 's0,s1,s2          = ',s0,s1,s2
write(*,'(A,3(ES15.8))') 's0-s1,s0-s2,s1-s2 = ',s0-s1,s0-s2,s1-s2

write(*,'(A,3(ES15.8))') 'r0-s1,r1-s1,r2-s1    = ',r0-s1,r1-s1,r2-s1

stop
end

!**********************************************

sample output

nx,ny,nz =          131         131         131
size(foo) =      2248091

r0,r1,r2          =  1.12398225E+06 1.12399525E+06 1.12397238E+06
r0-r1,r0-r2,r1-r2 = -1.30000000E+01 9.87500000E+00 2.28750000E+01
s0,s1,s2          =  1.12397975E+06 1.12397975E+06 1.12398225E+06
s0-s1,s0-s2,s1-s2 =  0.00000000E+00-2.50000000E+00-2.50000000E+00
r0-s1,r1-s1,r2-s1    =  2.50000000E+00 1.55000000E+01-7.37500000E+00

推荐答案

sum 内在函数返回与处理器有关的近似值,该近似值是数组参数元素的总和.这与顺序添加所有元素不同.

The sum intrinsic function returns a processor-dependant approximation to the sum of the elements of the array argument. This is not the same thing as adding sequentially all elements.

在其中找到一个数组 x 很简单

It is simple to find an array x where

summation = x(1) + x(2) + x(3)

(严格从左到右执行)不是将这些值视为数学实数"而不是浮点数的总和的最佳近似值.

(performed strictly left to right) is not the best approximation for the sum treating the values as "mathematical reals" rather than floating point numbers.

作为查看用ifort近似的性质的具体示例,我们可以看下面的程序.我们需要在此处启用优化以查看效果.即使禁用优化( -O0 -debug ),求和顺序的重要性仍然显而易见.

As a concrete example to look at the nature of the approximation with ifort, we can look at the following program. We need to enable optimizations here to see effects; the importance of order of summation is apparent even with optimizations disabled (with -O0 or -debug).

  implicit none

  integer i
  real x(50)
  real total

  x = [1.,(EPSILON(0.)/2, i=1, SIZE(x)-1)]
  total = 0
  do i=1, SIZE(x)
     total = total+x(i)
     print '(4F17.14)', total, SUM(x(:i)), SUM(DBLE(x(:i))), REAL(SUM(DBLE(x(:i))))
  end do
end program

如果按照严格的顺序相加,我们得到 1.,看到大小小于 epsilon(0.)的任何内容都不会影响总和.

If adding up in strict order we get 1., seeing that anything smaller in magnitude than epsilon(0.) doesn't affect the sum.

您可以试验数组的大小及其元素的顺序,小数的缩放和ifort浮点编译选项(例如 -fp-model strict -mieee-fp -pc32 ).您还可以尝试使用双精度而不是默认的实数来找到上述示例.

You can experiment with the size of the array and order of its elements, the scaling of the small numbers and the ifort floating point compilation options (such as -fp-model strict, -mieee-fp, -pc32). You can also try to find an example like the above using double precision instead of default real.

这篇关于sum函数返回的答案与显式循环不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆