为什么浮点数不准确？ [英] Why Are Floating Point Numbers Inaccurate?

查看：189 发布时间：2017/12/19 22:16:37 language-agnostic floating-point floating-point-precision

本文介绍了为什么浮点数不准确？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为什么一些数字在以浮点数存储时会失去准确性？例如，十进制数 9.2 可以精确地表示为两个十进制整数（ 92/10 ）的比率，二者都可以精确地用二进制表示（ 0b1011100 / 0b1010 ）。然而，存储为浮点数的相同比率永远不会等于 9.2 ：

  32位single precisionfloat：9.19999980926513671875 
 64位double precisionfloat：9.199999999999999289457264239899814128875732421875

这样一个明显简单的数字如何能够在64位内存中太大地表达？

解决方案

在大多数编程语言中，浮点数很像科学记数法：有一个指数和一个尾数（也称为有效数）。一个非常简单的数字，比如 9.2 实际上是这个分数：

lockquote
5179139571476070 * 2 ^-49

其中指数为 -49 ，尾数是 5179139571476070 。不可能用这种方式表示一些十进制数的原因是指数和尾数都是整数。换句话说，所有的浮点数必须是一个整数乘以2的整数次幂。

9.2 可能只是 92/10 ，但如果 n 限制为整数值， 10 不能表示为 2 ⁿ 。 b

查看数据

首先，参见构成32位和64位 float 的组件。如果您只关心输出（Python中的示例），请关注这些内容：
$ b

def float_to_bin_parts（数字，bits = 64）：如果位== 32：＃单精度 int_pack ='I' float_pack ='f' exponent_bits = 8 mantissa_bits = 23 exponent_bias = 127 elif位== 64：＃双精度。所有的python浮点数都是 int_pack ='Q' float_pack ='d' exponent_bits = 11 mantissa_bits = 52 exponent_bias = 1023 else：增加ValueError，'bits参数必须是32或64' bin_iter = iter（bin（struct.unpack（int_pack，struct.pack（float_pack，number））[0]）[2：]。 rjust（bits，'0'）） return [''.join（islice（bin_iter，x））for（1，exponent_bits，mantissa_bits）]
这个函数背后有很多复杂的东西，它可以解释的相切，但是如果你感兴趣，我们的目的是结构体模块。

Python的 float 是一个64位的双精度数字。在诸如C，C ++，Java和C＃等其他语言中，双精度具有单独的类型 double ，通常以64位实现。

当我们用我们的例子 9.2 调用这个函数时，我们得到：

>>> float_to_bin_parts（9.2） ['0'，'10000000010'，'0010011001100110011001100110011001100110011001100110']

解释数据

你会看到我已经将返回值分成三个部分。这些组件包括：
$ b

签名

指数

尾数（也称为有意义或分数）

标志

作为单个位存储在第一个组件中。这很容易解释： 0 表示float是一个正数， 1 表示是负值。因为 9.2 是肯定的，所以我们的符号值是 0 。

< h2>指数

指数以11位的形式存储在中间组件中。在我们的例子中， 0b10000000010 。在十进制中，表示值 1026 。这个组件的一个怪癖就是你必须减去一个等于2（sup）（＃位数） - 1 - 1 的数来得到真正的指数。在我们的例子中，这意味着减去 0b1111111111 （十进制数 1023 ）得到真指数 0b00000000011 （十进制数字3）。

尾数

存储尾数在第三个组件中为52位。但是，这个组件也有一个怪癖。为了理解这个怪癖，请用科学记数法来考虑一个数字，如下所示：

6.0221413x10 <23>

尾数为 6.0221413 。回想一下，科学记数法中的尾数总是以一个非零数字开始。除了二进制只有两个数字： 0 和 1 以外，二进制也是如此。所以二进制尾数总是以 1 开头！当存储浮点数时，二进制尾数前面的 1 被省略以节省空间;我们必须把它放在我们的第三个元素的前面来获得真正的尾数：

lockquote
1.0010011001100110011001100110011001100110011001100110 / b>

这不仅仅是一个简单的加法，因为存储在第三个组件中的位实际上代表了小数部分尾数在基数点的右侧。

在处理小数时，我们通过乘以或除以10的幂来移动小数点。在二进制中，我们可以通过乘或除因为我们的第三个元素有52位，所以我们把它分成2 ⁵² ，把它移到右边52个位置：

0.0010011001100110011001100110011001100110011001100110

十进制表示法与 675539944105574 由 4503599627370496 得到 0.1499999999999999 。（这是一个比例的一个例子，可以用二进制表示，但只能用十进制表示;更多的细节请参见： 675539944105574/4503599627370496 。）

现在我们已经将第三个分量转换成分数了， 1 给出了真正的尾数。

收回零部件

<符号（第一个组件）： 0 为正值， 1 为负值 li>

指数（中间组件）：减去2（sup）（＃位数） - 1 - 1 得到真指数

尾数（最后一个元素）：除以2（sup）（位数）并添加 1 获得真正的尾数

$ h
$ b
计算数字

把所有三个部分放在一起，我们得到这个二进制数：

1.0010011001100110011001100110011001100110011001100110 x 10 ¹¹

然后可以从二进制转换为十进制：

lockquote
1.1499999999999999 x 2 ³（不精确！）

$ p
$ b
然后乘以揭示我们开始存储的数字（ 9.2 ）的最终表示形式作为浮点值：

9.1999999999999993

< hr>

表示为分数

9.2

我们已经建立了这个数字，可以把它重建成一个简单的分数：

lockquote
1.0010011001100110011001100110011001100110011001100110 x 10 ¹¹

将尾数转换为整数：

10010011001100110011001100110011001100110011001100110 x 10 ^11-110100

转换为十进制：

<5179139571476070 x 2 ^3-52

减去指数：

5179139571476070 x 2 ^-49

将负指数转换为除法：

5179139571476070/2 ⁴⁹

乘数指数：

5179139571476070/562949953421312 p>

等于：

9.1999999999999993

9.5

>>> ; float_to_bin_parts（9.5） ['0'，'10000000010'，'0011000000000000000000000000000000000000000000000000']
你已经可以看到尾数只有4位数，后面跟着大量的零。但是，让我们通过这些步骤。

汇编二进制科学记数法：

lockquote

1.0011 x 10 ¹¹

移动小数点：$ b
$ b

10011 x 10 11-100

减去指数：

10011 x 10 <-1>

二进制到十进制：

19 x 2 ^-1

除法的负指数：

19/2 < sup> 1

乘数指数：

19/2

等于：

9.5

进一步阅读

浮点指南：每个程序员应该知道什么关于浮点算术，或者，为什么我的数字不加起来？（floating-point-gui.de）

计算机科学家应该知道什么是浮动点算法（Goldberg 1991）
IEEE双精度浮点格式（Wikipedia）浮点运算：问题与局限性（docs.python.org） 浮点二进制文件

Why do some numbers lose accuracy when stored as floating point numbers?

For example, the decimal number 9.2 can be expressed exactly as a ratio of two decimal integers (92/10), both of which can be expressed exactly in binary (0b1011100/0b1010). However, the same ratio stored as a floating point number is never exactly equal to 9.2:
32-bit "single precision" float: 9.19999980926513671875 64-bit "double precision" float: 9.199999999999999289457264239899814128875732421875
How can such an apparently simple number be "too big" to express in 64 bits of memory?
解决方案
In most programming languages, floating point numbers are represented a lot like scientific notation: with an exponent and a mantissa (also called the significand). A very simple number, say 9.2, is actually this fraction:

5179139571476070 * 2^-49

Where the exponent is -49 and the mantissa is 5179139571476070. The reason it is impossible to represent some decimal numbers this way is that both the exponent and the mantissa must be integers. In other words, all floats must be an integer multiplied by an integer power of 2.

9.2 may be simply 92/10, but 10 cannot be expressed as 2ⁿ if n is limited to integer values.

Seeing the Data

First, a few functions to see the components that make a 32- and 64-bit float. Gloss over these if you only care about the output (example in Python):
def float_to_bin_parts(number, bits=64): if bits == 32: # single precision int_pack = 'I' float_pack = 'f' exponent_bits = 8 mantissa_bits = 23 exponent_bias = 127 elif bits == 64: # double precision. all python floats are this int_pack = 'Q' float_pack = 'd' exponent_bits = 11 mantissa_bits = 52 exponent_bias = 1023 else: raise ValueError, 'bits argument must be 32 or 64' bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number))[0])[2:].rjust(bits, '0')) return [''.join(islice(bin_iter, x)) for x in (1, exponent_bits, mantissa_bits)]
There's a lot of complexity behind that function, and it'd be quite the tangent to explain, but if you're interested, the important resource for our purposes is the struct module.

Python's float is a 64-bit, double-precision number. In other languages such as C, C++, Java and C#, double-precision has a separate type double, which is often implemented as 64 bits.

When we call that function with our example, 9.2, here's what we get:
>>> float_to_bin_parts(9.2) ['0', '10000000010', '0010011001100110011001100110011001100110011001100110']

Interpreting the Data

You'll see I've split the return value into three components. These components are:

Sign

Exponent

Mantissa (also called Significand, or Fraction)

Sign

The sign is stored in the first component as a single bit. It's easy to explain: 0 means the float is a positive number; 1 means it's negative. Because 9.2 is positive, our sign value is 0.

Exponent

The exponent is stored in the middle component as 11 bits. In our case, 0b10000000010. In decimal, that represents the value 1026. A quirk of this component is that you must subtract a number equal to 2^{(# of bits) - 1} - 1 to get the true exponent; in our case, that means subtracting 0b1111111111 (decimal number 1023) to get the true exponent, 0b00000000011 (decimal number 3).

Mantissa

The mantissa is stored in the third component as 52 bits. However, there's a quirk to this component as well. To understand this quirk, consider a number in scientific notation, like this:

6.0221413x10²³

The mantissa would be the 6.0221413. Recall that the mantissa in scientific notation always begins with a single non-zero digit. The same holds true for binary, except that binary only has two digits: 0 and 1. So the binary mantissa always starts with 1! When a float is stored, the 1 at the front of the binary mantissa is omitted to save space; we have to place it back at the front of our third element to get the true mantissa:

1.0010011001100110011001100110011001100110011001100110

This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point.

When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of 10. In binary, we can do the same thing by multiplying or dividing by powers of 2. Since our third element has 52 bits, we divide it by 2⁵² to move it 52 places to the right:

0.0010011001100110011001100110011001100110011001100110

In decimal notation, that's the same as dividing 675539944105574 by 4503599627370496 to get 0.1499999999999999. (This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal; for more detail, see: 675539944105574 / 4503599627370496.)

Now that we've transformed the third component into a fractional number, adding 1 gives the true mantissa.

Recapping the Components

Sign (first component): 0 for positive, 1 for negative

Exponent (middle component): Subtract 2^{(# of bits) - 1} - 1 to get the true exponent

Mantissa (last component): Divide by 2^{(# of bits)} and add 1 to get the true mantissa

Calculating the Number

Putting all three parts together, we're given this binary number:

1.0010011001100110011001100110011001100110011001100110 x 10¹¹

Which we can then convert from binary to decimal:

1.1499999999999999 x 2³ (inexact!)

And multiply to reveal the final representation of the number we started with (9.2) after being stored as a floating point value:

9.1999999999999993

Representing as a Fraction

9.2

Now that we've built the number, it's possible to reconstruct it into a simple fraction:

1.0010011001100110011001100110011001100110011001100110 x 10¹¹

Shift mantissa to a whole number:

10010011001100110011001100110011001100110011001100110 x 10^11-110100

Convert to decimal:

5179139571476070 x 2^3-52

Subtract the exponent:

5179139571476070 x 2^-49

Turn negative exponent into division:

5179139571476070 / 2⁴⁹

Multiply exponent:

5179139571476070 / 562949953421312

Which equals:

9.1999999999999993

9.5

>>> float_to_bin_parts(9.5) ['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces.

Assemble the binary scientific notation:

1.0011 x 10¹¹

Shift the decimal point:

10011 x 10^11-100

Subtract the exponent:

10011 x 10^-1

Binary to decimal:

19 x 2^-1

Negative exponent to division:

19 / 2¹

Multiply exponent:

19 / 2

Equals:

9.5

Further reading

The Floating-Point Guide: What Every Programmer Should Know About Floating-Point Arithmetic, or, Why don’t my numbers add up? (floating-point-gui.de)

What Every Computer Scientist Should Know About Floating-Point Arithmetic (Goldberg 1991)

IEEE Double-precision floating-point format (Wikipedia)

Floating Point Arithmetic: Issues and Limitations (docs.python.org)

Floating Point Binary

这篇关于为什么浮点数不准确？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

表示为分数

9.2

9.5

进一步阅读

Seeing the Data

Interpreting the Data

Sign

Exponent

Mantissa

Recapping the Components

Calculating the Number

Representing as a Fraction

9.2

9.5

Further reading

为什么浮点数不准确？ [英] Why Are Floating Point Numbers Inaccurate?

问题描述

查看数据

解释数据

标志

尾数

收回零部件

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么浮点数不准确？ [英] Why Are Floating Point Numbers Inaccurate?

问题描述

查看数据

解释数据

标志

尾数

收回零部件

计算数字

表示为分数

9.2

9.5

进一步阅读

Seeing the Data

Interpreting the Data

Sign

Exponent

Mantissa

Recapping the Components

Calculating the Number

Representing as a Fraction

9.2

9.5

Further reading

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭