IEEE-754 浮点精度:允许多少误差? [英] IEEE-754 floating-point precision: How much error is allowed?

查看：36 发布时间：2022/1/9 10:56:13 floating-point glibc floating-accuracy ieee-754 double-precision

本文介绍了IEEE-754 浮点精度:允许多少误差?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从 sqrt 函数(用于 64 位双精度)>fdlibm 到我目前正在使用的模型检查工具 (cbmc).
作为我工作的一部分，我阅读了很多关于 ieee-754 标准的内容，但我认为我不了解基本操作(包括 sqrt)的精度保证.

测试我的 fdlibm 的 sqrt 端口，我在 64 位双精度上使用 sqrt 得到以下计算:

<预> <代码> SQRT(1977061516825203605555216616167125005658976571589721139027150498657494589171970335387417823661417383745964289845929120708819092392090053015474001800648403714048.0)= 44464159913633855548904943164666890000299422761159637702558734139742800916250624.0

(这个案例在我的精度测试中打破了一个简单的后置条件；我不确定这个后置条件是否适用于 IEEE-754)

为了比较，几个多精度工具计算如下:

<预> <代码> SQRT(1977061516825203605555216616167125005658976571589721139027150498657494589171970335387417823661417383745964289845929120708819092392090053015474001800648403714048.0)= 44464159913633852501611468455197640079591886932526256694498106717014555047373210.truncated

可以看出，左起第17个数字不同，表示如下错误:

3047293474709469249920707535828633381008060627422728245868877413.0

问题 1:允许出现这么大的错误吗?

标准是说每个基本操作(+、-、*、/、sqrt)都应该在 0.5 ulps 以内，这意味着它应该等于四舍五入到最接近的 fp 表示的数学精确结果(wiki 说一些库只保证 1 个 ulp，但目前这并不重要).

问题2:这是否意味着，每一个基本操作都应该有一个错误<2.220446e-16 与 64 位双精度(机器 epsilon)?

我确实用 x86-32 linux 系统 (glibc/eglibc) 计算了相同的结果，得到的结果与使用 fdlibm 获得的结果相同，这让我认为:

a:我做错了什么(但是如何:printf 会成为候选人，但我不知道这是否可能是原因)
b:错误/精度在这些库中很常见

解决方案

IEEE-754 标准要求所谓的基本运算"(包括加法、乘法、除法和平方根)正确舍入.这意味着有一个唯一的允许答案，它是最接近所谓无限精确"运算结果的可表示浮点数.

在双精度中，数字具有 53 位二进制精度，因此正确答案是四舍五入到 53 位有效数字的确切答案.正如 Rick Regan 在他的回答中所表明的，这正是你得到的结果.

您的问题的答案是:

问题 1:允许出现这么大的错误吗?

是的，但是将此错误称为巨大"是一种误导.事实是，没有可以返回的双精度值会产生更小的误差.

问题2:这是否意味着，每一个基本操作都应该有一个错误<2.220446e-16 与 64 位双精度(机器 epsilon)?

没有.这意味着每个基本操作都应该根据当前的舍入模式四舍五入到(唯一的)最接近的可表示浮点数.这与说相对误差受机器 epsilon 限制并不完全相同.

问题 3:您使用 x86 硬件和 gcc + libc 获得了哪个结果?

你的答案是一样的，因为 sqrt 在任何合理的平台上都是正确的.

I'm working on porting the sqrt function (for 64-bit doubles) from fdlibm to a model-checker tool I'm using at the moment (cbmc).
As part of my doings, I read a lot about the ieee-754 standard, but I think I didn't understand the guarantees of precision for the basic operations (incl. sqrt).

Testing my port of fdlibm's sqrt, I got the following calculation with sqrt on a 64-bit double:
sqrt(1977061516825203605555216616167125005658976571589721139027150498657494589171970335387417823661417383745964289845929120708819092392090053015474001800648403714048.0) = 44464159913633855548904943164666890000299422761159637702558734139742800916250624.0
(this case broke a simple post-condition in my test regarding precision; I'm not sure anymore if this post-condition is possible with IEEE-754)

For a comparison, several multi-precision tools calculated something like:
sqrt(1977061516825203605555216616167125005658976571589721139027150498657494589171970335387417823661417383745964289845929120708819092392090053015474001800648403714048.0) =44464159913633852501611468455197640079591886932526256694498106717014555047373210.truncated
One can see, that the 17-th number from the left is different, meaning an error like:
3047293474709469249920707535828633381008060627422728245868877413.0
Question 1: Is this huge amount of error allowed?

The standard is saying that every basic operation (+,-,*,/,sqrt) should be within 0.5 ulps, meaning that it should be equal to a mathematically exact result rounded to the nearest fp-representation (wiki is saying that some libraries only guarantees 1 ulp, but that isn't that important at the moment).

Question 2: Does that mean, that every basic operation should have an error < 2.220446e-16 with 64-bit doubles (machine-epsilon)?

I did calculate the same with a x86-32 linux system (glibc / eglibc) and got the same result like that obtained with fdlibm, which let me think that:

a: I did something wrong (but how: printf would be a candidate, but I don't know if that could be the reason)

b: the error/precision is common in these libraries

解决方案
The IEEE-754 standard requires that so called "basic operations" (which include addition, multiplication, division and square root) are correctly rounded. This means that there is a unique allowed answer, and it is the closest representable floating-point number to the so-called "infinitely precise" result of the operation.

In double-precision, numbers have 53 binary digits of precision, so the correct answer is the exact answer rounded to 53 significant digits. As Rick Regan showed in his answer, this is exactly the result that you got.

The answers to your questions are:

Question 1: Is this huge amount of error allowed?

Yes, but it is quite misleading to call this error "huge". The fact is that there is no double-precision value that could be returned that would have a smaller error.

Question 2: Does that mean, that every basic operation should have an error < 2.220446e-16 with 64-bit doubles (machine-epsilon)?

No. It means that every basic operation should be rounded to the (unique) closest representable floating-point number according to the current rounding mode. This is not quite the same as saying that the relative error is bounded by machine epsilon.

Question 3: Which result do you obtain with your x86 hardware and gcc + libc?

The same answer you did, because sqrt is correctly rounded on any reasonable platform.

这篇关于IEEE-754 浮点精度:允许多少误差?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

IEEE-754 浮点精度:允许多少误差? [英] IEEE-754 floating-point precision: How much error is allowed?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

IEEE-754 浮点精度:允许多少误差? [英] IEEE-754 floating-point precision: How much error is allowed?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭