3 * x + x总是精确吗? [英] Is 3*x+x always exact?

查看:172
本文介绍了3 * x + x总是精确吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设严格的IEEE 754(没有超额的精确度)和到最接近的偶数模式,是 3 * x + x 总是== 4 * x (因此确实没有溢出),为什么?

我无法展示反例,所以我进入了漫长的讨论每一个可能的尾随模式 abc 和四舍五入的情况,但我觉得我可以错过一个案例,也错过了一个更简单的演示...



我也有一个直觉,它可以扩展到(2 ^ n-1)x + x == 2 ^ nx 和我们应该有(2 ^ n - 1)x == 2 ^只要n <= 54,但是 y-x + x == y 通过IEEE 754属性的nx-x ...

解决方案

在下面,代码格式是在IEEE-754中以最接近的方式计算出来的,而不是以代码格式的数学是准确的。


$ b $ p

设为有效位数中的位数。

f 是正整数 n 的因子2 n -1,并且可以精确地表示( n

)。

设U( x )为 x的ULP。对于正常值,U( x )≤2 1



t f * x 。如果 f * x 是低于正常值的,那么就是 fx 。如果这是正常的,那么对于某些人来说, = fx + e ≤½ U( fx )≤2 - p x 。请注意,如果| e |正好是ULP的一半,那么它必须等于所设置的 x 的最低位(因为否则 e 将具有多于一个位设置并且不能是ULP)。



代替 f t =(2 -1 ) + e



t + x =(2 n -1) x + + x = 2 + e



考虑 T + X 。按照IEEE-754要求的最近,这个必须在一半以内;我们知道是一个 t + x 的ULP,它是2 n +的ë的。显然,2 n x 是可表示的(禁止溢出),而| e ≤½ U( fx )≤½ U(2 n x )。所以 t + x 必须是2 n x ,除非| / EM> |正好是ULP的一半,而 x 的有效位的低位是奇数(因为甚至低位赢得领带并给出2 n 如果 n 是1,那么 f 是1,并且 如果2≤ n ,那么| e ≤1/4 U(2 n x )< &一半; U(2 名词 X )。所以一个| e |的情况是ULP的一半,而 x 的低位是奇数不会发生。

因此 t + x code>必须是2 n x 。 (溢出和NaN留给读者作为练习。)

另外,我对IEEE-754 32位二进制浮点进行了详尽的测试。


Assuming strict IEEE 754 (no excess precision) and round to nearest even mode, is 3*x+x always == 4*x (and thus exact in absence of overflow) and why?

I was not able to exhibit a counter-example, so I went into lengthy dicussion of every possible trailing bit pattern abc and rounding case, but I feel like I could have missed a case, and also missed a simpler demonstration...

I also have an intuition that this could be extended to (2^n-1) x + x == 2^n x and testing every combination of trailing bits in this case is not an option.

We should have (2^n - 1) x == 2^n x - x by property of IEEE 754 as long as n <= 54, but y-x+x == y is not generally true...

解决方案

In the following, math shown in code format is computed with IEEE 754 in round-to-nearest mode, and math not in code format is exact.

Let p be the number of bits in the significand.

Let f be the factor 2n-1 for a positive integer n and be exactly representable (np).

Let U(x) be the ULP of x. For normal values, U(x) ≤ 21-px.

Let t be f*x. If f*x is subnormal, then it is exactly fx. If it is normal, then t = fx+e for some |e| ≤ ½U(fx) ≤ 2-px. Note that if |e| is exactly half an ULP, then it must equal the lowest bit of x that is set (since otherwise e would have more than one bit set and could not be half of an ULP).

Substituting for f, t = (2n-1)x+e.

t+x = (2n-1)x+e+x = 2nx+e.

Consider t+x. By IEEE-754 requirements of round-to-nearest, this must be within ½ an ULP of t+x, which we know to be 2nx+e. Clearly 2nx is representable (barring overflow), and |e| ≤ ½U(fx) ≤ ½U(2nx). Therefore t+x must be 2nx unless |e| is exactly half an ULP and the low bit of x’s significand is odd (since an even low bit wins the tie and gives 2nx).

If n is 1, then f is 1, and e is 0. If 2 ≤ n, then |e| ≤ 1/4 U(2nx) < ½U(2nx). So a case where |e| is half an ULP and x’s low bit is odd does not occur.

Therefore t+x must be 2nx. (Overflow and NaN left as an exercise for the reader.)

Additionally, I tested exhaustively for IEEE-754 32-bit binary floating-point.

这篇关于3 * x + x总是精确吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆