为什么这个表达式会导致浮点错误? [英] Why does this expression cause a floating point error?

查看:184
本文介绍了为什么这个表达式会导致浮点错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以浮点运算是不精确,但是这并不能完全解释这里发生了什么:

  [46] pry(main)> a = 0.05 
=> 0.05
[47] pry(main)> a = a * 26.0 / 65
=> 0.02

所以在这里我们有了我们所期望的,我们得到了正确的答案,世界一直在转动。但是后来我们重写了这个函数,当我们这样做的时候,我们把 a * = 26.0 / 65 换成 a = a * 26.0 / 65 code>不是很好,我们输入了一个更少的字符!让我们来看看这是怎么解决的?

  [48] pry(main)> a = 0.05 
=> 0.05
[49] pry(main)> a * = 26.0 / 65
=> 0.020000000000000004
[50] pry(main)> 26.0 / 65
=> 0.4

显示 a * = b 与编写 a = a * b 不一样。它似乎并不是一个正常的浮动四舍五入误差,因为这些数字都不应该被四舍五入为一个浮点数(尾数应该超过26.0,26.0,65,65.0中的每一个)。

我敢肯定,有一些微妙的事情发生了,并想知道发生了什么事情?

解决方案

浮点格式的有效位数不足以代表26/65。 (有意义是首选的术语,意义是线性的,Mantissas是对数的。)

二进制浮点数的有效数是一个二进制整数。该整数根据指数进行缩放。要用二进制浮点表示26/65,即.4,我们必须将其表示为一个乘以2的幂的整数。例如,对.4的近似值是1·2·1·sup = .5。更好的近似值是3·2 -3 -3 = 375。更好的是26·2 -4 = .40625。

然而,无论你使用什么样的整数或者什么指数,这种格式永远不会是完全.4。假设你有.4 = f 2 e ,其中 是整数。那么2/5 = f em = 2e / em> ,所以2 /(5f)= 2 然后1 /(5f / 2)= 2e / em-1和5 i> f = 2 1- e 。为了实现这一点,5必须是2的幂。它不是,所以你不能有.4 = f 2 e

<在IEEE-754 64位二进制浮点中,有效位有53位。用这个,0.4的最接近表示值是0.40000000000000002220446049250313080847263336181640625,它等于3602879701896397•2 -53

现在让我们来看看你的计算。在 a = 0.05 中, 0.05 被转换为浮点数,产生0.05000000000000277555756156289135105907917022705078125。 b
a * 26.0 / 65 中,首先评估 a * 26.0 。精确的数学结果四舍五入到最接近的可表示值,产生1.3000000000000000444089209850062616169452667236328125。然后这是除以65.再次,答案是四舍五入,产生0.0200000000000000004163336342344337026588618755340576171875。当Ruby打印这个值时,它显然决定它足够接近.02,它只能显示.02而不是完整的值。从这个意义上讲,这是合理的,如果将打印的值.02转换回浮点数,则会再次获得实际值,即0.0200000000000000004163336342344337026588618755340576171875。所以.02在某种意义上是一个很好的代表0.0200000000000000004163336342344337026588618755340576171875。

在你的替代表达式中,你有 a * = 26.0 / 65 。在此,首先评估 26.0 / 65 。这产生了0.40000000000000002220446049250313080847263336181640625。 这与第一个表达式不同,因为您已经以不同的顺序执行了操作,所以不同的数字被舍入。可能发生的是,第一个表达式中的值被舍去了,而这个不同的值,因为它发生在哪里,相对于可浮点值表示的值,向上舍入。

然后将该值乘以 a 。这产生0.02000000000000000388578058618804789148271083831787109375。请注意,这个值比第一个表达式的结果还要远.02。您的Ruby实现知道这一点,所以它确定打印.02不足以准确表示它。相反,它显示更多数字,显示0.020000000000000004。


So Floating point operations are inexact but that doesn't fully explain what's going on here:

[46] pry(main)> a=0.05
=> 0.05
[47] pry(main)> a=a*26.0/65
=> 0.02

so here we have what we expect, we get the right answer and the world keeps turning beautifully. But we later rewrite this function and while we do this we swap the line a=a*26.0/65 for a*=26.0/65 isn't that nice, we typed one less character! Let's see how that's worked out for us?

[48] pry(main)> a=0.05
=> 0.05
[49] pry(main)> a*=26.0/65
=> 0.020000000000000004
[50] pry(main)> 26.0/65
=> 0.4

It shows that a*=b is not the same as writing a=a*b. it doesn't seem to be a normal float rounding error because none of these numbers should be rounded as a float (mantissa should be more than long enough for each of 26.0, 26.0/65, 65.0)

I'm sure there's something subtle going on under the hood and would like to know what's going on?

解决方案

It is not true that the significand of the floating-point format has enough bits to represent 26/65. ("Significand" is the preferred term. Significands are linear. Mantissas are logarithmic.)

The significand of a binary floating-point number is a binary integer. This integer is scaled according to the exponent. To represent 26/65, which is .4, in binary floating-point, we must represent it as an integer multiplied by a power of two. For example, an approximation to .4 is 1•2-1 = .5. A better approximation is 3•2-3=.375. Better still is 26•2-4 = .40625.

However, no matter what integer you use for the significand or what exponent you use, this format can never be exactly .4. Suppose you had .4 = f•2e, where f and e are integers. Then 2/5 = f•2e, so 2/(5f) = 2e, and then 1/(5f) = 2e-1 and 5f = 21-e. For that to be true, 5 would have to be a power of two. It is not, so you cannot have .4 = f•2e.

In IEEE-754 64-bit binary floating-point, the significand has 53 bits. With this, the closest representable value to .4 is 0.40000000000000002220446049250313080847263336181640625, which equals 3602879701896397•2-53.

Now let us look at your calculations. In a=0.05, 0.05 is converted to floating-point, which produces 0.05000000000000000277555756156289135105907917022705078125.

In a*26.0/65, a*26.0 is evaluated first. The exact mathematical result is rounded to the nearest representable value, producing 1.3000000000000000444089209850062616169452667236328125. Then this is divided by 65. Again, the answer is rounded, producing 0.0200000000000000004163336342344337026588618755340576171875. When Ruby prints this value, it apparently decides it is close enough to .02 that it can just display ".02" and not the complete value. This is reasonable in the sense that, if you convert the printed value .02 back to floating-point, you get the actual value again, 0.0200000000000000004163336342344337026588618755340576171875. So ".02" is in some sense a good representative for 0.0200000000000000004163336342344337026588618755340576171875.

In your alternative expression, you have a*=26.0/65. In this, 26.0/65 is evaluated first. This produces 0.40000000000000002220446049250313080847263336181640625. This is different from the first expression because you have performed the operations in a different order, so a different number was rounded. It may have happened that a value in the first expression was rounded down whereas this different value, because of where it happened to land relative to values representable in floating-point, rounded up.

Then the value is multiplied by a. This produces 0.02000000000000000388578058618804789148271083831787109375. Note that this value is further from .02 than the result of the first expression. Your implementation of Ruby knows this, so it determines that printing ".02" is not enough to represent it accurately. Instead, it displays more digits, showing 0.020000000000000004.

这篇关于为什么这个表达式会导致浮点错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆