指数函数的计算精度 [英] Precision of calculations after exponent function
问题描述
我有一个给定的浮点数 X
和该数字的舍入,精确到 10 ^( - n)
小数点位置: X'
。现在,我想知道,如果在计算指数函数之后: y = 2 ^(x)
,我的数字和四舍五入的数字之间的差异将保持在同一水平精确。我的意思是:
| 2 ^(X)-2 ^(X')|
10 ^( - n-1)
指数放大相对误差并且,通过扩展,ulp错误。考虑一下这个例子:
$ $ p $ $ $ $ c $ float x = 0x1.fffffep6;
printf(x =%a%15.8e exp2(x)=%a%15.8e\\\
,x,x,exp2f(x),exp2f(x));
x = nextafterf(x,0.0f);
printf(x =%a%15.8e exp2(x)=%a%15.8e\\\
,x,x,exp2f(x),exp2f(x));
这会打印出类似于
<$ p $ (x)= 0x1.ffff4ep + 127 3.40280562e + 38
x = 0x1.fffffcp + 6 1.27999985e + 02 exp2(x )= 0x1.fffe9ep + 127 3.40278777e + 38
结果中的最大ulp错误是与使用的浮点格式的2幂指数位相同的数量级。在这个特定的例子中,在IEEE-754 float
中有8个指数位,输入的1个ulp差异转化为176 ulp的结果差异。参数之间的相对差异约为5.5e-8,而结果的相对差异约为5.3e-6。
一种简单直观的思维方式关于这个放大倍数是在浮点参数的有效位数/尾数中的有限数量的位中,一些仅对结果的量值(因此指数位)有贡献(在示例中,这些位是代表如果你用数学的方式来看它,如果原始的参数x = n(n是整数部分127),而其余的位有助于结果的有效位数/尾数位。 *(1 +ε),则e x = e n *(1 +ε)= e n * e n * ε≈e* n *(1 + n *ε)。所以如果n≈128,ε≈1e-7,那么期望的最大相对误差在1.28e-5左右。
I have a question regarding precision of calculations - it is more of a mathematical theory behind programming.
I have a given float number X
and the rounding of that number, which is accurate to 10^(-n)
decimal place: X'
. Now, I would like to know, if after calculating exponent function: y=2^(x)
the difference between my number and the rounded number would stay on the same level of precision. I mean:
|2^(X)-2^(X')|
is at the level of 10^(-n-1)
Exponentiation magnifies relative error and, by extension, ulp error. Consider this illustrative example:
float x = 0x1.fffffep6;
printf ("x=%a %15.8e exp2(x)=%a %15.8e\n", x, x, exp2f (x), exp2f(x));
x = nextafterf (x, 0.0f);
printf ("x=%a %15.8e exp2(x)=%a %15.8e\n", x, x, exp2f (x), exp2f(x));
This will print something like
x=0x1.fffffep+6 1.27999992e+02 exp2(x)=0x1.ffff4ep+127 3.40280562e+38
x=0x1.fffffcp+6 1.27999985e+02 exp2(x)=0x1.fffe9ep+127 3.40278777e+38
The maximum ulp error in the result will be on the same order of magnitude as 2exponent bits of the floating format used. In this particular example, there are 8 exponent bits in an IEEE-754 float
, and a 1 ulp difference in the input translates into a 176 ulp difference in the result. The relative difference in the arguments is about 5.5e-8, while the relative difference in the results is about 5.3e-6.
A simplified, intuitive, way of thinking about this magnification is that out of the finite number of bits in the significand / mantissa of the floating-point argument, some only contribute to the magnitude, thus exponent bits, of the result (in the example, these would be the bits representing the integral portion of 127), while the remaining bits contribute to the significand / mantissa bits of the result.
If you look at it mathematically, if the original argument x = n*(1+ε), then ex = en*(1+ε) = en * en*ε ≈ en * (1+n*ε). So if n ≈ 128, ε ≈ 1e-7, then expected maximum relative error is around 1.28e-5.
这篇关于指数函数的计算精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!