优化失败了鲁棒性措施 [英] Optimization defeats robustness measures

查看:152
本文介绍了优化失败了鲁棒性措施的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究数组求和的强大方法,并实现了Shewchuk在自适应精确浮点算术和快速鲁棒几何谓词。尽管实现的算法在 gfortran 中按预期工作,但 ifort 优化了对策。



为了给出一些上下文,下面是我的代码:

  module test_mod 
包含
函数shewchukSum(数组)结果(res)
隐式无
实数,意图(in):: array(:)
real :: res
integer :: xIdx,yIdx,i,nPartials
real :: partials(100),hi,lo,x,y

nPartials = 0
do xIdx = 1,size(array)
i = 0
x = array(xIdx)

!如果(abs(x) do yIdx = 1,nPartials
y = partials(yIdx)
hi = x + y
then
lo = x - (hi - y)
else
lo = y - (hi - x)
endif
x = hi

!如果发生舍入错误,请将其存储。如果(lo == 0)周期
i = i + 1,则精确比较意图
; partials(i)= lo
enddo! yIdx
nPartials = i + 1; partials(nPartials)= x
enddo! xIdx

res = sum(partials(:nPartials))
结束函数
结束模块

调用测试程序是

 程序测试
使用test_mod
implicit none
print *,sum([1.e0,1.e16,1.e0,-1.e16])
print *,shewchukSum([1.e0,1.e16, 1.e0,-1.e16])
结束程序

使用 gfortran 会为所有优化级别生成正确的结果:

  ./ a.out 
0.00000000
2.00000000

ifort ,但是,为 -O0 以上的所有优化生成零:

  ./ a.out 
0.00000000
0.00000000

我尝试调试代码并进入程序集级别,并发现 ifort 正在优化掉 lo 的计算以及操作在<$之后c $ c> if(lo == 0.)cycle 。



是否有可能强制 ifort 执行所有优化级别的完整操作?这个添加是计算的关键部分,我希望它尽可能快地运行。
作为比较, gfortran -O2 执行此代码大约比<$ c快8到10倍$ c> ifort at -O0 (针对长度大于100k的数组测量)。

解决方案

当涉及到浮点操作时,ifort的默认值通常用于性能而不是严格的正确性。



有许多选项可以控制浮点行为。使用ifort 16和选项 -assume protect_parens 即使在更高的优化级别下,我也可以获得预期的行为。

另外,有一些选项 -fp-model precise -fp-model source (后者意味着 -assume protect_parens 也可能对你来说很重要, -fp-model 的默认值是 fast = 1 which


允许值不安全的优化


当然,这些可能有对性能的影响,所以围绕浮点行为的其他选择也值得考虑。

更多的细节可以在 Intel出版物


I am currently investigating robust methods for the summation of arrays, and implemented the algorithm published by Shewchuk in "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". While the implemented algorithm works as expected in gfortran, ifort optimizes the countermeasures away.

To give some context, here is my code:

module test_mod
contains
  function shewchukSum( array ) result(res)
    implicit none
    real,intent(in) :: array(:)
    real            :: res
    integer         :: xIdx, yIdx, i, nPartials
    real            :: partials(100), hi, lo, x, y

    nPartials = 0
    do xIdx=1,size(array)
      i = 0
      x = array(xIdx)

      ! Calculate the partial sums
      do yIdx=1,nPartials
        y = partials(yIdx)
        hi = x + y
        if ( abs(x) < abs(y) ) then
          lo = x - (hi - y)
        else
          lo = y - (hi - x)
        endif
        x = hi

        ! If a round-off error occured, store it. Exact comparison intended
        if ( lo == 0. ) cycle
        i = i + 1 ; partials(i) = lo
      enddo ! yIdx
      nPartials = i + 1 ; partials( nPartials ) = x
    enddo ! xIdx

    res = sum( partials(:nPartials) )
  end function
end module

And the calling test program is

program test
  use test_mod
  implicit none
  print *,        sum([1.e0, 1.e16, 1.e0, -1.e16])
  print *,shewchukSum([1.e0, 1.e16, 1.e0, -1.e16])
end program

Compilation using gfortran with produces the correct results for all optimization levels:

./a.out 
   0.00000000    
   2.00000000   

ifort, however, produces zeros for all optimizations above -O0:

./a.out 
   0.00000000
   0.00000000

I tried to debug the code and went down to the assembly level and figured out that ifort is optimizing away the calculation of lo and the operations after if ( lo == 0. ) cycle .

Is there a possibility to force ifort to perform the complete operation for all levels of optimization? This addition is a critical part of the calculations, and I want it to run as fast as possible. For comparison, gfortran at -O2 executes this code approximately eight to ten times faster than ifort at -O0 (measured for arrays of length >100k).

解决方案

When it comes to floating point operations, the default for ifort is generally for performance rather than strict correctness.

There are a number of options to control the floating point behaviour. Using ifort 16 and the option -assume protect_parens I get the expected behaviour even at higher optimization levels.

Additionally, there are the options -fp-model precise -fp-model source (this latter implies -assume protect_parens which may also be of interest to you. The default for -fp-model is fast=1 which

allows value-unsafe optimizations

Naturally, these may have an impact on performance, so other options around the floating point behaviour are also worth considering.

Much further detail can be found in an Intel publication.

这篇关于优化失败了鲁棒性措施的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆