浮点运算精度 [英] Accuracy of floating point arithmetic

查看:110
本文介绍了浮点运算精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法理解这种程序的输出

I'm having trouble understanding the output of this program

int main()
{
    double x = 1.8939201459282359e-308;
    double y = 4.9406564584124654e-324;
    printf("%23.16e\n", 1.6*y);
    printf("%23.16e\n", 1.7*y);
    printf("%23.16e\n", 1.8*y);
    printf("%23.16e\n", 1.9*y);
    printf("%23.16e\n", 2.0*y);
    printf("%23.16e\n", x + 1.6*y);
    printf("%23.16e\n", x + 1.7*y);
    printf("%23.16e\n", x + 1.8*y);
    printf("%23.16e\n", x + 1.9*y);
    printf("%23.16e\n", x + 2.0*y);
}

的输出是

9.8813129168249309e-324
9.8813129168249309e-324
9.8813129168249309e-324
9.8813129168249309e-324
9.8813129168249309e-324
1.8939201459282364e-308
1.8939201459282364e-308
1.8939201459282369e-308
1.8939201459282369e-308
1.8939201459282369e-308

我使用IEEE算法。变量拥有尽可能小的IEEE号。前五个打印显示一个数字是两倍y为我所期望的。什么是困惑我的是,在未来五年版画显示不同的号码。如果 1.6 * Y 相同 2.0 * Y 那么如何才能 X + 1.6 *是 X + 2.0 * Y

I'm using IEEE arithmetic. The variable y holds the smallest possible IEEE number. The first five prints show a number which is twice y as I would expect. What is confusing me is that the next five prints show different numbers. If 1.6*y is the same as 2.0*y then how can x + 1.6*y be different from x + 2.0*y?

推荐答案

您说,您的编译器的Visual C ++ 2010例preSS。
我没有访问该编译器,但据我所知,它会生成初始配置的x87 CPU使用precision的53位,为了尽可能地模拟IEEE 754双precision计算尽可能的方案。

In a nutshell

You say that your compiler is Visual C++ 2010 Express. I do not have access to this compiler, but I understand that it generates programs that initially configure the x87 CPU to use 53 bits of precision, in order to emulate IEEE 754 double-precision computations as closely as possible.

不幸的是,尽可能接近并不总是足够接近。历史80位浮点寄存器可以有自己的尾数宽度限制了模拟双precision的目的,但他们始终保持全方位的指数。所不同的显示特别是操纵非正规时(如您的)。

Unfortunately, "as closely as possible" is not always close enough. Historical 80-bit floating-point registers can have their significand limited in width for the purpose of emulating double-precision, but they always retain a full range for the exponent. The difference shows in particular when manipulating denormals (like your y).

我的解释是,在的printf(%23.16e \\ n,1.6 * Y); 1.6 * Y 被计算为一个80位缩减有效数字全阶数(它是这样一个正常数),然后转化为IEEE 754的双precision(导致规格化),然后打印出来。

My explanation would be that in printf("%23.16e\n", 1.6*y);, 1.6*y is computed as a 80-bit reduced-significand full-exponent number (it is thus a normal number), then converted to IEEE 754 double-precision (resulting in a denormal), then printed.

在另一方面,在的printf(%23.16e \\ n,X + 1.6 * Y); X + 1.6 *是计算所有的80位减少,尾数全数字指数(再一次所有的中间结果都是正常的数字),再转换为IEEE 754双precision,然后打印出来。

On the other hand, in printf("%23.16e\n", x + 1.6*y);, x + 1.6*y is computed with all 80-bit reduced-significand full-exponent numbers (again all intermediate results are normal numbers), then converted to IEEE 754 double-precision, then printed.

这可以解释为什么 1.6 * Y 打印相同 2.0 * Y 倒是时有不同的效果到 X 。印在号码是一个双precision非规格化。被添加到 X 的号码是80位缩减尾数全指数正常数量(不一样的)。

This would explain why 1.6*y prints the same as 2.0*y but has a different effect when added to x. The number that is printed is a double-precision denormal. The number that is added to x is a 80-bit reduced-significand full-exponent normal number (not the same one).

其他编译器,如GCC,不配置的x87 FPU操纵53位的有效数。这可以在这种情况下,同样的后果( X + 1.6 * Y 将与所有80位全满尾数号码的指数来计算,然后转换为双精度型precision打印或存储在存储器中)。在这种情况下,该问题是明显的,甚至更频繁地(不需要涉及非正规或无限号码,以通知差异)。

Other compilers, like GCC, do not configure the x87 FPU to manipulate 53-bit significands. This can have the same consequences (in this case x + 1.6*y would be computed with all 80-bit full significand full exponent numbers, and then converted to double-precision for printing or storing in memory). In this case, the issue is noticeable even more often (you do not need to involve denormals or infinite numbers to notice differences).

由大卫Monniaux 这文章包含了所有你可能想要更多的细节。

This article by David Monniaux contains all the details you may wish for and more.

要摆脱这个问题(如果你认为它是一个),找到你告诉编译器生成浮点SSE2指令的标志。这些正是实现IEEE 754语义单和双precision。

To get rid of the problem (if you consider it to be one), find the flag that tells your compiler to generate SSE2 instructions for floating-point. These implement exactly IEEE 754 semantics for single- and double-precision.

这篇关于浮点运算精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆