使用双精度时,为什么不(x /(y * z))与(x / y / z)相同? [英] When using doubles, why isn't (x / (y * z)) the same as (x / y / z)?

查看:214
本文介绍了使用双精度时,为什么不(x /(y * z))与(x / y / z)相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这部分是学术性的,就我的目的而言,我只需将它四舍五入到小数点后两位;但是我很想知道结果会产生两个稍微不同的结果。



这是我写的测试,以便将其缩小到最简单的实现:

  @Test 
public void shouldEqual(){
double expected = 450.00d /(7d * 60); // 1.0714285714285714
double actual = 450.00d / 7d / 60; // 1.0714285714285716

assertThat(actual).isEqualTo(expected);
}

但输出失败:

  org.junit.ComparisonFailure:
预期:1.0714285714285714
实际:1.0714285714285716

任何人都可以详细解释引擎盖下的内容,导致价值为1.000000000000000 X 不同?



我在答案中寻找的一些要点是:
精度丢失在哪里?
首选哪种方法,为什么?
哪个是正确的? (在纯数学中,两者都不对。也许两者都错了?)
这些算术运算有更好的解决方案或方法吗?

解决方案

我看到一堆问题告诉你如何解决这个问题,但没有一个真正解释发生了什么的问题,除了浮点舍入误差是坏的,m'kay? 那么让我来看看吧。让我首先指出,这个答案中的任何内容都不是特定于Java 。舍入误差是数字的任何固定精度表示所固有的问题,因此在C中会出现相同的问题。



十进制数据类型中的舍入误差



作为一个简单的例子,假设我们有某种本机使用无符号十进制数据类型的计算机,我们称之为 float6d 。数据类型的长度为6位:4个专用于尾数,2个专用于指数。例如,数字3.142可以表示为

  3.142 x 10 ^ 0 

将以6位数字存储为

  503142 

前两位是指数加50,后四位是尾数。此数据类型可以表示从 0.001 x 10 ^ -50 9.999 x 10 ^ + 49 的任何数字。



实际上,这不是真的。它无法存储任何号码。如果你想代表3.141592怎么办?还是3.1412034?还是3.141488906?幸运的是,数据类型不能存储超过四位数的精度,因此编译器必须对具有更多数字的任何内容进行舍入以适应数据类型的约束。如果你写

  float6d x = 3.141592; 
float6d y = 3.1412034;
float6d z = 3.141488906;

然后编译器将这三个值中的每一个转换为相同的内部表示, 3.142 x 10 ^ 0 (记住,存储为 503142 ),以便 x == y == z 将保持为真。



重点是有一整套实数都映射到相同的基础数字序列(或者在真实计算机中的位。具体来说,任何 x 满足 3.1415< = x< = 3.1425 (假设半舍入舍入)转换为表示 503142 用于存储在内存中。



每次你的程序都会发生在内存中存储浮点值。它第一次发生时是你在源代码中写一个常量,就像我上面用 x y ,和 z 。每当您执行算术运算时,它会再次发生 ,这会增加超出数据类型所代表的精度位数。这些效果中的任何一种都称为,用于分析某个幅度的随机(或不可预测)错误如何影响您的结果。具体来说,对于乘法或除法,结果中的平均相对误差可以通过在每个操作数中添加正交中的相对误差来近似 - 也就是说,将它们平方,添加它们并取平方根。使用我们的 float6d 数据类型,相对误差在0.0005(对于像0.101这样的值)和0.00005(对于像0.995这样的值)之间变化。





让我们将0.0001作为值 x y 中相对误差的粗略平均值。 <$ p $ c> x * y 或 x / y 中的相对错误由


$给出b $ b

  sqrt(0.0001 ^ 2 + 0.0001 ^ 2)= 0.0001414 

这是因为 sqrt(2)大于每个单独值的相对误差。



在组合操作时,您可以多次应用此公式,每次浮点运算一次。例如,对于 z /(x * y) x * y 中的相对误差平均为,0.0001414(在此小数示例中)然后 z /(x * y)中的相对误差为

  sqrt(0.0001 ^ 2 + 0.0001414 ^ 2)= 0.0001732 

请注意,平均相对误差随着每个操作而增加,特别是作为乘法和除法的平方根。



同样,对于 z / x * y z / x 中的平均相对误差为0.0001414,中的相对误差为z / x * y

  sqrt(0.0001414 ^ 2 + 0.0001 ^ 2)= 0.0001732 

所以,在这种情况下也一样。这意味着对于任意值,平均而言,这两个表达式引入了大致相同的错误。 (理论上,就是这样。我看到这些操作在实践中表现得非常不同,但这是另一个故事。)



血腥细节



您可能对您在问题中提供的特定计算感到好奇,而不仅仅是平均值。对于那个分析,让我们切换到二进制算术的真实世界。大多数系统和语言中的浮点数使用 IEEE标准754 表示。对于64位数字,格式指定专用于尾数的52位,11到指数,一个到标志。换句话说,当写入基数2时,浮点数是表格的值

  1.1100000000000000000000000000000000000000000000000000 x 2 ^ 00000000010 
52位11位

领先 1 未明确存储,并构成第53位。此外,您应该注意,存储以表示指数的11位实际上是实指数加上1023.例如,此特定值为7,即1.75 x 2 2 。尾数是1.75(二进制),或 1.11 ,指数是1023 + 2 = 1025(二进制),或 10000000001 ,所以存储在内存中的内容是

  01000000000111100000000000000000000000000000000000000000000000000 
^ ^
指数尾数

但这并不重要。



你的例子也是涉及450,

  1.1100001000000000000000000000000000000000000000000000 x 2 ^ 00000001000 

和60,

  1.1110000000000000000000000000000000000000000000000000 x 2 ^ 00000000101 

您可以使用此转换器或互联网上的任何其他转换器。



计算第一个时表达式, 450 /(7 * 60),处理器首先执行乘法运算离子,获得420,或

  1.1010010000000000000000000000000000000000000000000000 x 2 ^ 00000001000 

然后它将450除以420.这产生15/14,这是

  1.0001001001001001001001001001001001001001001001001001001001001001 ... 

二进制。现在, Java语言规范表示


必须将不精确的结果四舍五入到最接近无限精确结果的可表示值;如果两个最接近的可表示值相等,则选择具有最低有效位0的那个。这是IEEE 754标准的默认舍入模式,称为舍入到最近。


64位IEEE 754格式中最接近15/14的可表示值是

  1.0001001001001001001001001001001001001001001001001 x 2 ^ 00000000000 

这是大约 1.0714285714285714 十进制。 (更准确地说,这是唯一指定此特定二进制表示的最不精确的十进制值。)



另一方面,如果先计算450/7,结果是64.2857142857 ...,或二进制,

  1000000.01001001001001001001001001001001001001001001001001001001001 ... 

其中最接近的可表示值是

  1.0000000100100100100100100100100100100100100100100 x 2 ^ 00000000110 

这是64.28571428571429180465 ...注意二进制尾数的最后一位数的变化(由于舍入误差,与精确值相比较。将此除以60会给你

  1.000100100100100100100100100100100100100100100100110011001100110011 ... 

看结尾:模式不同!它是 0011 重复,而不是像其他情况一样重复 001 。最接近的可表示值是

  1.0001001001001001001001001001001001001001001001010 x 2 ^ 00000000000 

与最后两位的其他操作顺序不同:它们是 10 而不是 01 。十进制当量是1.0714285714285716。



如果你看一下确切的二进制值,应该清楚导致这种差异的特定舍入:

  1.0001001001001001001001001001001001001001001001001001001001001001 ... 
1.0001001001001001001001001001001001001001001001001100110011001100110 ...
^尾数的最后一位

在这种情况下,前者的结果(数字15/14)恰好是精确值的最准确表示。这是一个如何离开分部直到最终使您受益的一个例子。但同样,只要您使用的值不使用数据类型的完整精度,此规则就会成立。一旦开始处理不精确(舍入)值,您就不再通过先进行乘法来保护自己免受进一步的舍入错误。


This is partly academic, as for my purposes I only need it rounded to two decimal places; but I am keen to know what is going on to produce two slightly different results.

This is the test that I wrote to narrow it to the simplest implementation:

@Test
public void shouldEqual() {
  double expected = 450.00d / (7d * 60);  // 1.0714285714285714
  double actual = 450.00d / 7d / 60;      // 1.0714285714285716

  assertThat(actual).isEqualTo(expected);
}

But it fails with this output:

org.junit.ComparisonFailure: 
Expected :1.0714285714285714
Actual   :1.0714285714285716

Can anyone explain in detail what is going on under the hood to result in the value at 1.000000000000000X being different?

Some of the points I'm looking for in an answer are: Where is the precision lost? Which method is preferred, and why? Which is actually correct? (In pure maths, both can't be right. Perhaps both are wrong?) Is there a better solution or method for these arithmetic operations?

解决方案

I see a bunch of questions that tell you how to work around this problem, but not one that really explains what's going on, other than "floating-point roundoff error is bad, m'kay?" So let me take a shot at it. Let me first point out that nothing in this answer is specific to Java. Roundoff error is a problem inherent to any fixed-precision representation of numbers, so you get the same issues in, say, C.

Roundoff error in a decimal data type

As a simplified example, imagine we have some sort of computer that natively uses an unsigned decimal data type, let's call it float6d. The length of the data type is 6 digits: 4 dedicated to the mantissa, and 2 dedicated to the exponent. For example, the number 3.142 can be expressed as

3.142 x 10^0

which would be stored in 6 digits as

503142

The first two digits are the exponent plus 50, and the last four are the mantissa. This data type can represent any number from 0.001 x 10^-50 to 9.999 x 10^+49.

Actually, that's not true. It can't store any number. What if you want to represent 3.141592? Or 3.1412034? Or 3.141488906? Tough luck, the data type can't store more than four digits of precision, so the compiler has to round anything with more digits to fit into the constraints of the data type. If you write

float6d x = 3.141592;
float6d y = 3.1412034;
float6d z = 3.141488906;

then the compiler converts each of these three values to the same internal representation, 3.142 x 10^0 (which, remember, is stored as 503142), so that x == y == z will hold true.

The point is that there is a whole range of real numbers which all map to the same underlying sequence of digits (or bits, in a real computer). Specifically, any x satisfying 3.1415 <= x <= 3.1425 (assuming half-even rounding) gets converted to the representation 503142 for storage in memory.

This rounding happens every time your program stores a floating-point value in memory. The first time it happens is when you write a constant in your source code, as I did above with x, y, and z. It happens again whenever you do an arithmetic operation that increases the number of digits of precision beyond what the data type can represent. Either of these effects is called roundoff error. There are a few different ways this can happen:

  • Addition and subtraction: if one of the values you're adding has a different exponent from the other, you will wind up with extra digits of precision, and if there are enough of them, the least significant ones will need to be dropped. For example, 2.718 and 121.0 are both values that can be exactly represented in the float6d data type. But if you try to add them together:

       1.210     x 10^2
    +  0.02718   x 10^2
    -------------------
       1.23718   x 10^2
    

    which gets rounded off to 1.237 x 10^2, or 123.7, dropping two digits of precision.

  • Multiplication: the number of digits in the result is approximately the sum of the number of digits in the two operands. This will produce some amount of roundoff error, if your operands already have many significant digits. For example, 121 x 2.718 gives you

       1.210     x 10^2
    x  0.02718   x 10^2
    -------------------
       3.28878   x 10^2
    

    which gets rounded off to 3.289 x 10^2, or 328.9, again dropping two digits of precision.

    However, it's useful to keep in mind that, if your operands are "nice" numbers, without many significant digits, the floating-point format can probably represent the result exactly, so you don't have to deal with roundoff error. For example, 2.3 x 140 gives

       1.40      x 10^2
    x  0.23      x 10^2
    -------------------
       3.22      x 10^2
    

    which has no roundoff problems.

  • Division: this is where things get messy. Division will pretty much always result in some amount of roundoff error unless the number you're dividing by happens to be a power of the base (in which case the division is just a digit shift, or bit shift in binary). As an example, take two very simple numbers, 3 and 7, divide them, and you get

       3.                x 10^0
    /  7.                x 10^0
    ----------------------------
       0.428571428571... x 10^0
    

    The closest value to this number which can be represented as a float6d is 4.286 x 10^-1, or 0.4286, which distinctly differs from the exact result.

As we'll see in the next section, the error introduced by rounding grows with each operation you do. So if you're working with "nice" numbers, as in your example, it's generally best to do the division operations as late as possible because those are the operations most likely to introduce roundoff error into your program where none existed before.

Analysis of roundoff error

In general, if you can't assume your numbers are "nice", roundoff error can be either positive or negative, and it's very difficult to predict which direction it will go just based on the operation. It depends on the specific values involved. Look at this plot of the roundoff error for 2.718 z as a function of z (still using the float6d data type):

In practice, when you're working with values that use the full precision of your data type, it's often easier to treat roundoff error as a random error. Looking at the plot, you might be able to guess that the magnitude of the error depends on the order of magnitude of the result of the operation. In this particular case, when z is of the order 10-1, 2.718 z is also on the order of 10-1, so it will be a number of the form 0.XXXX. The maximum roundoff error is then half of the last digit of precision; in this case, by "the last digit of precision" I mean 0.0001, so the roundoff error varies between -0.00005 and +0.00005. At the point where 2.718 z jumps up to the next order of magnitude, which is 1/2.718 = 0.3679, you can see that the roundoff error also jumps up by an order of magnitude.

You can use well-known techniques of error analysis to analyze how a random (or unpredictable) error of a certain magnitude affects your result. Specifically, for multiplication or division, the "average" relative error in your result can be approximated by adding the relative error in each of the operands in quadrature - that is, square them, add them, and take the square root. With our float6d data type, the relative error varies between 0.0005 (for a value like 0.101) and 0.00005 (for a value like 0.995).

Let's take 0.0001 as a rough average for the relative error in values x and y. The relative error in x * y or x / y is then given by

sqrt(0.0001^2 + 0.0001^2) = 0.0001414

which is a factor of sqrt(2) larger than the relative error in each of the individual values.

When it comes to combining operations, you can apply this formula multiple times, once for each floating-point operation. So for instance, for z / (x * y), the relative error in x * y is, on average, 0.0001414 (in this decimal example) and then the relative error in z / (x * y) is

sqrt(0.0001^2 + 0.0001414^2) = 0.0001732

Notice that the average relative error grows with each operation, specifically as the square root of the number of multiplications and divisions you do.

Similarly, for z / x * y, the average relative error in z / x is 0.0001414, and the relative error in z / x * y is

sqrt(0.0001414^2 + 0.0001^2) = 0.0001732

So, the same, in this case. This means that for arbitrary values, on average, the two expressions introduce approximately the same error. (In theory, that is. I've seen these operations behave very differently in practice, but that's another story.)

Gory details

You might be curious about the specific calculation you presented in the question, not just an average. For that analysis, let's switch to the real world of binary arithmetic. Floating-point numbers in most systems and languages are represented using IEEE standard 754. For 64-bit numbers, the format specifies 52 bits dedicated to the mantissa, 11 to the exponent, and one to the sign. In other words, when written in base 2, a floating point number is a value of the form

1.1100000000000000000000000000000000000000000000000000 x 2^00000000010
                       52 bits                             11 bits

The leading 1 is not explicitly stored, and constitutes a 53rd bit. Also, you should note that the 11 bits stored to represent the exponent are actually the real exponent plus 1023. For example, this particular value is 7, which is 1.75 x 22. The mantissa is 1.75 in binary, or 1.11, and the exponent is 1023 + 2 = 1025 in binary, or 10000000001, so the content stored in memory is

01000000000111100000000000000000000000000000000000000000000000000
 ^          ^
 exponent   mantissa

but that doesn't really matter.

Your example also involves 450,

1.1100001000000000000000000000000000000000000000000000 x 2^00000001000

and 60,

1.1110000000000000000000000000000000000000000000000000 x 2^00000000101

You can play around with these values using this converter or any of many others on the internet.

When you compute the first expression, 450/(7*60), the processor first does the multiplication, obtaining 420, or

1.1010010000000000000000000000000000000000000000000000 x 2^00000001000

Then it divides 450 by 420. This produces 15/14, which is

1.0001001001001001001001001001001001001001001001001001001001001001001001...

in binary. Now, the Java language specification says that

Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.

and the nearest representable value to 15/14 in 64-bit IEEE 754 format is

1.0001001001001001001001001001001001001001001001001001 x 2^00000000000

which is approximately 1.0714285714285714 in decimal. (More precisely, this is the least precise decimal value that uniquely specifies this particular binary representation.)

On the other hand, if you compute 450 / 7 first, the result is 64.2857142857..., or in binary,

1000000.01001001001001001001001001001001001001001001001001001001001001001...

for which the nearest representable value is

1.0000000100100100100100100100100100100100100100100101 x 2^00000000110

which is 64.28571428571429180465... Note the change in the last digit of the binary mantissa (compared to the exact value) due to roundoff error. Dividing this by 60 gives you

1.000100100100100100100100100100100100100100100100100110011001100110011...

Look at the end: the pattern is different! It's 0011 that repeats, instead of 001 as in the other case. The closest representable value is

1.0001001001001001001001001001001001001001001001001010 x 2^00000000000

which differs from the other order of operations in the last two bits: they're 10 instead of 01. The decimal equivalent is 1.0714285714285716.

The specific rounding that causes this difference should be clear if you look at the exact binary values:

1.0001001001001001001001001001001001001001001001001001001001001001001001...
1.0001001001001001001001001001001001001001001001001001100110011001100110...
                                                     ^ last bit of mantissa

It works out in this case that the former result, numerically 15/14, happens to be the most accurate representation of the exact value. This is an example of how leaving division until the end benefits you. But again, this rule only holds as long as the values you're working with don't use the full precision of the data type. Once you start working with inexact (rounded) values, you no longer protect yourself from further roundoff errors by doing the multiplications first.

这篇关于使用双精度时,为什么不(x /(y * z))与(x / y / z)相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆