浮点运算和重现性 [英] Floating point arithmetic and reproducibility

查看:84
本文介绍了浮点运算和重现性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

IEEE-754算术可在不同平台上重现吗?

我正在测试一些用R编写的使用随机数的代码.我以为在所有经过测试的平台上设置随机数生成器的种子将使测试具有可重复性,但是对于rexp()似乎并非如此,因为rexp()生成指数分布的随机数.

这是我在32位Linux上得到的:

options(digits=22) ; set.seed(9) ; rexp(1, 5)
# [1] 0.2806184054728815824298
sessionInfo()
# R version 3.0.2 (2013-09-25)
# Platform: i686-pc-linux-gnu (32-bit)

这就是我在64位OSX 10.9上获得的内容:

options(digits=22) ; set.seed(9) ; rexp(1, 5)
# [1] 0.2806184054728815269186
sessionInfo()
# R version 3.0.2 (2013-09-25)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)

64位Linux提供与64位OSX相同的结果,因此这似乎是32位与64位的问题.

让我们假设两个R版本均使用相同的GCC版本和相同(默认R)编译标志进行编译,这些编译标志使该编译器使用IEEE-754算术.

我的问题是,这可以视为R中的错误吗?还是使用近似的有限精度浮点算术只是正常"的结果?

我向R-devel邮件列表发送了相同的问题,但是列表上没有任何答案,只有一个答案是私下的,试图说服我这不是错误,我应该接受. >

这是IEEE-754关于可重复性的说法(来自维基百科):

IEEE 754-1985允许实现中的许多变体(例如 一些值的编码和某些异常的检测). IEEE 754-2008已收紧其中许多内容,但有一些变体 仍然保留(特别是对于二进制格式).重现性 条款建议语言标准应提供一种手段来 编写可复制的程序(即将产生相同内容的程序 导致一种语言的所有实现),并说明需要什么 以取得可重复的结果.

这在建议"下.

我(主观)认为这是一个错误,因为IEEE-754标准的整个要点都具有可重现的,与平台无关的浮点算法.

解决方案

高级语言中甚至基本浮点运算的重现性都存在问题,但通常可以通过各种平台特定的操作(如设置编译器)来控制它们开关,使用自定义代码设置浮点控件和模式,或者在必要时在汇编中编写基本操作.如注释中所述,您遇到的特定问题可能是不同的C实现使用不同的精度来评估中间浮点表达式.通常,这可以通过编译器开关或通过在表达式中包括强制转换和赋值来控制,以要求舍入为标称类型(从而舍弃多余的精度).

但是,更复杂的功能(例如expcos)通常无法在不同平台上重现.尽管2008 IEEE-754标准建议使用正确的舍入来实现这些功能,但是对于任何已知运行时间受到限制的数学库,此任务尚未完成. 世界上没有人做过数学来做到这一点.

CRlibm项目已实现了一些具有已知运行时界限的功能,但工作是不完整. (根据Pascal Cuoq的评论,当CRlibm没有经过验证的运行时界限以进行正确的舍入时,由于计算精度很高,它回落到很可能正确舍入的结果.)四舍五入的结果需要有限的时间,并且证明许多功能很难实现. (请考虑一下如何证明cos(x)的值(其中x是任何double的值)没有比距两个可表示值之间的中点小一些距离e更近的地方.中点很重要,因为它位于舍入必须从返回一个结果变为返回另一个结果,并且e告诉您为提供正确的舍入您必须多么精确地计算近似值.

目前的情况是,数学库中的许多函数都是近似的,其准确度要比正确的舍入小得多,并且不同的供应商使用具有不同近似值的不同实现.我以为R在其rexp实现中使用了其中一些功能,并且使用了其目标平台的本机库,因此在不同的平台上会得到不同的结果.

要解决此问题,您可以考虑在目标平台(可能是CRlibm)上使用公共数学库.

Is IEEE-754 arithmetic reproducible on different platforms?

I was testing some code written in R, that uses random numbers. I thought that setting the seed of the random number generator on all tested platforms would make the tests reproducible, but this does not seem to be true for rexp(), which generates exponentially distributed random numbers.

This is what I get on 32 bit Linux:

options(digits=22) ; set.seed(9) ; rexp(1, 5)
# [1] 0.2806184054728815824298
sessionInfo()
# R version 3.0.2 (2013-09-25)
# Platform: i686-pc-linux-gnu (32-bit)

and this is what I get on 64 bit OSX 10.9:

options(digits=22) ; set.seed(9) ; rexp(1, 5)
# [1] 0.2806184054728815269186
sessionInfo()
# R version 3.0.2 (2013-09-25)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)

64 bit Linux gives the same results as 64 bit OSX, so this seems to be a 32 bit vs 64 bit issue.

Let us assume that both R versions were compiled with the same GCC version, and with the same (default R) compilation flags that make the compiler use IEEE-754 arithmetic.

My question is, can this be considered a bug in R? Or is it just a "normal" consequence of using approximate, finite precision floating point arithmetic?

I sent the same question to the R-devel mailing list, but got no answer on the list, and only one answer in private, trying to convince me that this is not a bug and I should live with it.

This is what IEEE-754 says about reproducibility (from Wikipedia):

The IEEE 754-1985 allowed many variations in implementations (such as the encoding of some values and the detection of certain exceptions). IEEE 754-2008 has tightened up many of these, but a few variations still remain (especially for binary formats). The reproducibility clause recommends that language standards should provide a means to write reproducible programs (i.e., programs that will produce the same result in all implementations of a language), and describes what needs to be done to achieve reproducible results.

And this is under "Recommendations".

My (subjective) opinion is that this is a bug, because the whole point of the IEEE-754 standard is having reproducible, platform-independent floating point arithmetic.

解决方案

There are issues with reproducibility of even elementary floating-point operations in high-level languages, but they are usually controllable with various platform-specific operations such as setting compiler switches, using custom code to set floating-point controls and modes, or, if necessary, writing essential operations in assembly. As developed in comments, the specific problem you encountered may be that different C implementations use different precision to evaluate intermediate floating-point expressions. Often this can be controlled by compiler switches or by including casts and assignments in the expressions to require rounding to the nominal type (thus discarding excess precision).

However, the more complicated functions, such as exp and cos, are not routinely reproducible on different platforms. Although the 2008 IEEE-754 standard recommends that these functions be implemented with correct rounding, this task has not been completed for any math library with run-time known to be bounded. Nobody in the world has done the mathematics to accomplish this.

The CRlibm project has implemented some of the functions with known run-time bounds, but the work is incomplete. (Per Pascal Cuoq’s comment, when CRlibm does not have a proven run-time bound for correct rounding, it falls back to a result highly likely to be correctly rounded due to being computed with very high precision.) Figuring out how to deliver a correctly-rounded result in a bounded time and proving it is difficult for many functions. (Consider how you might prove that no value of cos(x), where x is any double value, is closer than some small distance e from the midpoint between two representable values. The midpoint is important because it is where rounding must change from returning one result to returning another, and e tells you how accurately and precisely you must calculate an approximation in order to provide correct rounding.)

The current state of affairs is that many of the functions in the math library are approximated, some accuracy looser than correct rounding is delivered, and different vendors use different implementations with different approximations. I am supposing that R uses some of these functions in its rexp implementation, and that it uses the native libraries of its target platforms, so it gets different results on different platforms.

To remedy this, you might consider using a common math library on the platforms you target (possibly CRlibm).

这篇关于浮点运算和重现性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆