浮点往返是否总是往返? [英] Does a floating-point reciprocal always round-trip?

查看:170
本文介绍了浮点往返是否总是往返?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于IEEE-754算术,对于倒数的最后一个精度是否有保证0或1单位?从这个角度来说,是否存在一个倒数倒数的保证误差?解决方案

[下面的所有内容都假设一个固定的IEEE 754二进制格式,有一些舍入模式的圆到最近的形式。]

由于倒数(计算为 1 / x )是一个基本的算术运算, 1 是完全可以表示的,算术运算保证按标准正确舍入,倒数结果保证在 0.5 单位在真值的最后一位。 (这适用于标准中指定的基本算术运算的任何 。)

值的倒数倒数<$ c $一般来说,c> x 不能保证等于 x 。一个IEEE 754 binary64格式的简单例子:

 >>> x = 1.8 
>>> 1.0 /(1.0 / x)
1.7999999999999998
>>> x == 1.0 /(1.0 / x)
False

然而,可以避免下溢,并且 x 是有限的,非零的,并且完全可以表示,下面的结果是正确的:


  1. 1.0 /(1.0 / x)的值将不同于 x 如果 x 的有效数(归一化为<1>,则不超过1个单位)。

  2. [1.0,2.0)范围内)小于 sqrt(2),则倒数 往返: 1.0 /(1.0 / x)== x


证明示意图:不失一般性,我们可以假定 x 是正数,而比例 x 2的幂,使得它位于 [1.0,2.0)范围内。上面的结果在 x 是2的精确幂的情况下显然是正确的,所以我们假设它不是(这将在下面的第二步中有用)。下面的证明是针对IEEE 754二进制64格式的特定精度给出的,但它可以直接适应任何IEEE 754二进制格式。

1 / x 为倒数前倒数的 true 值,并让 y 出)最接近的可表示binary64浮动到 1 / x 。然后:


  • 既然 y 是最接近< $ c> 1 / x ,并且 y 1 / x binade [0.5,1.0] ,其中连续的浮点数之间的间隔正好是 2 ^ -53 ,我们有 | y - 1 / x | < = 2 ^ -54 。事实上,我们可以做得更好一些:


  • 我们实际上有一个严格的不平等: | y - 1 / x | < 2 ^ -54 。如果 | y - 1 / x | 正好等于 2 ^ -54 ,那么 1 / x 可以在任意精度二进制浮点中精确表示(因为 y 2 ^ -54 是)。但是只有二进制浮点数 x ,其中 1 / x 可以精确地表示为2的幂,而我们已经排除了这种情况。

  • sqrt(2)
    然后 1 / x> x / 2 ,因此(将两者四舍五入到最接近的可表示的浮点数),我们有 y> = x / 2 ,所以 x / y <= 2 现在 x - 1 / y =(y - 1 / x)x / y ,从 | y - 1 / x | x / y (仍然假设 x ),我们得到 | x - 1 / y | < 2 ^ -53 。因此, x 是最接近的可表示的浮点数,它是 1 / y 1 / y 轮到 x ,往返成功。这就完成了第二部分的证明。在一般情况下,
  • x< 2 ,我们有 x / y < 4 ,所以 | x - 1 / y | < 2 ^ -52 。这使得 1 / y 最多距离 x 1 ulp,这完成了第1部分的证明。 >




下面是对 sqrt(2)阈值的示例:using Python ,我们在 [1.0,2.0)范围内取一百万个随机浮点数,并确定那些不能往返的倒数。所有小于 sqrt(2)的样本都会通过往返。

 >>> import random 
>>>样本= [random.uniform(1.0,2.0)for_in range(10 ** 6)]
>>> bad = [x for sample in if if 1.0 /(1.0 / x)!= x]
>>> len(坏)
171279
>>>分钟(坏)
1.4150519879892107
>>>导入数学
>>> math.sqrt(2)
1.4142135623730951

并且演示最大错误不再一般来说(对于binary64格式,在binade [1.0,2.0),最后一个单位是2 ^ -52):

 >>>样本= [random.uniform(1.0,2.0)for_in range(10 ** 6)] 
>>> max(abs(x - 1.0 /(1.0 / x))为样本中的x)
2.220446049250313e-16
>>> 2 ** - 52
2.220446049250313e-16

下面是IEEE 754 binary64格式显示有关避免下溢的限制是必要的:

 >>> x = 1.3e308 
>>> x_roundtrip = 1.0 /(1.0 / x)
>>> x.hex()
'0x1.72409614c1e6ap + 1023'
>>> x_roundtrip.hex()
'0x1.72409614c1e6cp + 1023'

这里<$ c因为 1 / x 小于最小的正常可表示的浮点数,所以$ c> x_roundtrip 结果与最后两个单元不同,所以没有像 x 那样表示精确度。



最后一点:IEEE 754-2008包含十进制浮点类型,我应该提到上面的证明几乎是逐字地转到十进制的情况下,确定了对于小于 sqrt(10)的浮点数,往返发生,而对于一般的十进制浮点数(再次避免溢出和下溢),我们永远不会超过一个单位在最后的地方。然而,要证明关键的不等式 | x - 1 / y | < 1/2 10 ^(1-p)始终是严格的:最终必须显示数量 1 + 16 10 ^(2p-1)永远不是一个方形数字(这是真实的,但它可能在这个网站的范围之外,包括在这里证明)。

 >>> from decimal import Decimal,getcontext 
>>> import random
>>> getcontext()。prec = 6
>>>样本= [+十进制(random.uniform(1.0,10.0))for _ in range(10 ** 6)]
>>>如果1 /(1 / x)!= x]
>>> min(坏)
十进制('3.16782')


For IEEE-754 arithmetic, is there a guarantee of 0 or 1 units in the last place accuracy for reciprocals? From that, is there a guaranteed error-bound on the reciprocal of a reciprocal?

解决方案

[Everything below assumes a fixed IEEE 754 binary format, with some form of round-to-nearest as the rounding-mode.]

Since reciprocal (computed as 1/x) is a basic arithmetic operation, 1 is exactly representable, and the arithmetic operations are guaranteed correctly rounded by the standard, the reciprocal result is guaranteed to be within 0.5 units in the last place of the true value. (This applies to any of the basic arithmetic operations specified in the standard.)

The reciprocal of the reciprocal of a value x is not guaranteed to be equal to x, in general. A quick example with the IEEE 754 binary64 format:

>>> x = 1.8
>>> 1.0 / (1.0 / x)
1.7999999999999998
>>> x == 1.0 / (1.0 / x)
False

However, assuming that overflow and underflow are avoided, and that x is finite, nonzero, and exactly representable, the following results are true:

  1. The value of 1.0 / (1.0 / x) will differ from x by no more than 1 unit in the last place.

  2. If the significand of x (normalised to lie in the range [1.0, 2.0) as usual) is smaller than sqrt(2), then the reciprocal does roundtrip: 1.0 / (1.0 / x) == x.

Sketch of proof: without loss of generality we can assume that x is positive, and scale x by a power of two so that it lies in the range [1.0, 2.0). The results above are clearly true in the case that x is an exact power of two, so let's assume it's not (this will be useful in the second step below). The proof below is given for precision specific to the IEEE 754 binary64 format, but it adapts directly to any IEEE 754 binary format.

Write 1 / x for the true value of the reciprocal, before rounding, and let y be the (unique, as it turns out) nearest representable binary64 float to 1 / x. Then:

  • since y is the closest float to 1 / x, and both y and 1/x are in the binade [0.5, 1.0], where the spacing between successive floats is exactly 2^-53, we have |y - 1/x| <= 2^-54. In fact, we can do a bit better:

  • we actually have a strict inequality above: |y - 1/x| < 2^-54. If |y - 1/x| were exactly equal to 2^-54, then 1/x would be exactly representable in arbitrary-precision binary floating-point (since both y and 2^-54 are). But the only binary floats x for which 1/x is exactly representable at some precision are powers of two, and we already excluded this case.

  • if x < sqrt(2) then 1 / x > x / 2, hence (rounding both to the nearest representable float), we have y >= x / 2, so x / y <= 2.

  • now x - 1/y = (y - 1/x) x/y, and from the bounds on |y - 1/x| and x/y (still assuming that x < sqrt(2)) we get |x - 1/y| < 2^-53. It follows that x is the closest representable float to 1/y, 1/y rounds to x and the roundtrip succeeds. This completes the proof of part 2.

  • in the general case x < 2, we have x / y < 4, so |x - 1/y| < 2^-52. That makes 1/y at most 1 ulp away from x, which completes the proof of part 1.

Here's a demonstration of the sqrt(2) threshold: using Python, we take a million random floats in the range [1.0, 2.0), and identify those that don't roundtrip through the reciprocal. All of the samples smaller than sqrt(2) pass the roundtrip.

>>> import random
>>> samples = [random.uniform(1.0, 2.0) for _ in range(10**6)]
>>> bad = [x for x in samples if 1.0 / (1.0 / x) != x]
>>> len(bad)
171279
>>> min(bad)
1.4150519879892107
>>> import math
>>> math.sqrt(2)
1.4142135623730951

And a demonstration that the maximum error is no more than 1 ulp, in general (for the binary64 format, in the binade [1.0, 2.0), 1 unit in the last place is 2^-52):

>>> samples = [random.uniform(1.0, 2.0) for _ in range(10**6)]
>>> max(abs(x - 1.0 / (1.0 / x)) for x in samples)
2.220446049250313e-16
>>> 2**-52
2.220446049250313e-16

Here's an example with the IEEE 754 binary64 format showing that the restriction about avoiding underflow is necessary:

>>> x = 1.3e308
>>> x_roundtrip = 1.0 / (1.0 / x)
>>> x.hex()
'0x1.72409614c1e6ap+1023'
>>> x_roundtrip.hex()
'0x1.72409614c1e6cp+1023'

Here the x_roundtrip result differs from the original by two units in the last place, because 1 / x was smaller than the smallest normal representable float, and so was not represented to the same precision as x.

Final note: since IEEE 754-2008 covers decimal floating-point types as well, I should mention that the above proof carries over almost verbatim to the decimal case, establishing that for floats with significand smaller than sqrt(10), roundtrip occurs, while for general decimal floats (again avoiding overflow and underflow) we're never off by more than one unit in the last place. However, there's some number-theoretic finesse required to show that the key inequality |x - 1/y| < 1/2 10^(1-p) is always strict: one ends up having to show that the quantity 1 + 16 10^(2p-1) is never a square number (which is true, but it's probably outside the ambit of this site to include the proof here).

>>> from decimal import Decimal, getcontext
>>> import random
>>> getcontext().prec = 6
>>> samples = [+Decimal(random.uniform(1.0, 10.0)) for _ in range(10**6)]
>>> bad = [x for x in samples if 1 / (1 / x) != x]
>>> min(bad)
Decimal('3.16782')

这篇关于浮点往返是否总是往返?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆