3 * x + x总是精确吗? [英] Is 3*x+x always exact?
问题描述
假设严格的IEEE 754(没有超额的精确度)和到最接近的偶数模式,是 3 * x + x
总是== 4 * x
(因此确实没有溢出),为什么?
我无法展示反例,所以我进入了漫长的讨论每一个可能的尾随模式 abc
和四舍五入的情况,但我觉得我可以错过一个案例,也错过了一个更简单的演示...
我也有一个直觉,它可以扩展到(2 ^ n-1)x + x == 2 ^ nx
和我们应该有(2 ^ n - 1)x == 2 ^只要n <= 54,但是
... y-x + x == y
通过IEEE 754属性的nx-x
在下面,代码格式
是在IEEE-754中以最接近的方式计算出来的,而不是以代码格式的数学是准确的。
$ b $ p
设为有效位数中的位数。
设 f 是正整数 n 的因子2 n -1,并且可以精确地表示( n ≤
)。设U( x )为 x的ULP。对于正常值,U( x )≤2 1 。
让 t 为 f * x
。如果 f * x
是低于正常值的,那么就是 fx 。如果这是正常的,那么对于某些人来说, = fx + e ≤½ U( fx )≤2 - p x 。请注意,如果| e |正好是ULP的一半,那么它必须等于所设置的 x 的最低位(因为否则 e 将具有多于一个位设置并且不能是ULP)。
代替 f , t =(2
t + x =(2 n -1) x +
考虑 因此 T + X
。按照IEEE-754要求的最近,这个必须在一半以内;我们知道是一个 t + x 的ULP,它是2 n +的ë的。显然,2 n x 是可表示的(禁止溢出),而| e ≤½ U( fx )≤½ U(2 n x )。所以 t + x
必须是2 n x ,除非| / EM> |正好是ULP的一半,而 x 的有效位的低位是奇数(因为甚至低位赢得领带并给出2 n 如果 n 是1,那么 f 是1,并且 如果2≤ n ,那么| e ≤1/4 U(2 n x )< &一半; U(2 名词 X )。所以一个| e |的情况是ULP的一半,而 x 的低位是奇数不会发生。
t + x
另外,我对IEEE-754 32位二进制浮点进行了详尽的测试。
Assuming strict IEEE 754 (no excess precision) and round to nearest even mode, is 3*x+x
always == 4*x
(and thus exact in absence of overflow) and why?
I was not able to exhibit a counter-example, so I went into lengthy dicussion of every possible trailing bit pattern abc
and rounding case, but I feel like I could have missed a case, and also missed a simpler demonstration...
I also have an intuition that this could be extended to (2^n-1) x + x == 2^n x
and testing every combination of trailing bits in this case is not an option.
We should have (2^n - 1) x == 2^n x - x
by property of IEEE 754 as long as n <= 54, but y-x+x == y
is not generally true...
In the following, math shown in code format
is computed with IEEE 754 in round-to-nearest mode, and math not in code format is exact.
Let p be the number of bits in the significand.
Let f be the factor 2n-1 for a positive integer n and be exactly representable (n ≤ p).
Let U(x) be the ULP of x. For normal values, U(x) ≤ 21-px.
Let t be f*x
. If f*x
is subnormal, then it is exactly fx. If it is normal, then t = fx+e for some |e| ≤ ½U(fx) ≤ 2-px. Note that if |e| is exactly half an ULP, then it must equal the lowest bit of x that is set (since otherwise e would have more than one bit set and could not be half of an ULP).
Substituting for f, t = (2n-1)x+e.
t+x = (2n-1)x+e+x = 2nx+e.
Consider t+x
. By IEEE-754 requirements of round-to-nearest, this must be within ½ an ULP of t+x, which we know to be 2nx+e. Clearly 2nx is representable (barring overflow), and |e| ≤ ½U(fx) ≤ ½U(2nx). Therefore t+x
must be 2nx unless |e| is exactly half an ULP and the low bit of x’s significand is odd (since an even low bit wins the tie and gives 2nx).
If n is 1, then f is 1, and e is 0. If 2 ≤ n, then |e| ≤ 1/4 U(2nx) < ½U(2nx). So a case where |e| is half an ULP and x’s low bit is odd does not occur.
Therefore t+x
must be 2nx. (Overflow and NaN left as an exercise for the reader.)
Additionally, I tested exhaustively for IEEE-754 32-bit binary floating-point.
这篇关于3 * x + x总是精确吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!