IEEE Std 754浮点:让t:= a - b,标准是否保证a == b + t? [英] IEEE Std 754 Floating-Point: let t := a - b, does the standard guarantee that a == b + t?

查看:142
本文介绍了IEEE Std 754浮点:让t:= a - b,标准是否保证a == b + t?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设 t a b 是所有双重(IEEE Std 754)变量,并且 a b 的值都不是 NaN (但可能是 Inf )。
t = a - b 之后,是否必须有 a == b + t


解决方案

绝对不是。一个明显的例子是 a = DBL_MAX b = -DBL_MAX 。那么 t = INFINITY ,所以 b + t 也是 INFINITY

更令人惊讶的是,有些情况下这种情况不会发生溢出。基本上,它们都是 a-b 不精确的形式。例如,如果 a DBL_EPSILON / 4 并且 b -1 ab 是1(假设默认舍入模式), a-b + b 是0。



我提到第二个例子的原因是这是 迫使四舍五入达到IEEE算术的特定精度。例如,如果你有一个范围在[0,1]的数字,并且想要强制将它舍入到4位的精度,你可以添加然后减去 0x1p49


Assume that t,a,b are all double (IEEE Std 754) variables, and both values of a, b are NOT NaN (but may be Inf). After t = a - b, do I necessarily have a == b + t?

解决方案

Absolutely not. One obvious case is a=DBL_MAX, b=-DBL_MAX. Then t=INFINITY, so b+t is also INFINITY.

What may be more surprising is that there are cases where this happens without any overflow. Basically, they're all of the form where a-b is inexact. For example, if a is DBL_EPSILON/4 and b is -1, a-b is 1 (assuming default rounding mode), and a-b+b is then 0.

The reason I mention this second example is that this is the canonical way of forcing rounding to a particular precision in IEEE arithmetic. For instance, if you have a number in the range [0,1) and want to force rounding it to 4 bits of precision, you would add and then subtract 0x1p49.

这篇关于IEEE Std 754浮点:让t:= a - b,标准是否保证a == b + t?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆