双数乘积与大量项的正确小数位数 [英] The number of correct decimal digits in a product of doubles with a large number of terms

查看:178
本文介绍了双数乘积与大量项的正确小数位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一组64位的机器上,无限数的集合的大小是紧的下界,在Matlab中表示为双精度乘数, em> k 产品的十进制数字?什么精度,例如,我可以期望乘以一起〜10 ^ 12双打编码不同的随机块的pi后吗?解决方案

假设标准IEEE 754有52 + 1比特的尾数,则表示相对精度在1.0000 ... 0和1.0000之间。 1,其中小数点后的二进制数字的数量是52.(你可以把1.000 ... 0看作是尾数二进制中存储的那个AKA有效数字)。

误差是52除以2(分辨率的一半)的功率的1/2。注意我选择的相对精度尽可能接近于1.0,因为这是最差的情况(否则在1.111..11和1.111..01之间,更准确)。

在十进制中,双精度的最差情况相对精度是1.11E-16。

如果用这个精度乘以N个双精度,没有额外的错误,因为中间四舍五入)是:

$ $ $ $ $ $ $ $ $ 1 $(1 - 1.11E-16)^ N
所以如果你乘pi(或任何双10 ^ 12)倍,错误的上限是:

>

  1.1102e-004 


$ b $如果你的CPU支持扩展精度的浮点数,那么你可以忽略中间舍入错误。中间结果。

如果没有使用扩展精度FPU(浮点单位),则在中间步骤舍入会导致额外的错误(sam因为乘法)。这意味着一个严格的下界计算如下:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ (1 - 1.11E-16)*(1-1.1E-16)
*(1-1.1E-16)*(1-1.1E-16)%,然后舍入
$
$ b *(1 - 1.11E-16)*(1 - 1.11E-16))

= 1-(1-1.11E-16)^(N * 2-1)

如果N太大,则运行时间太长。可能出现的错误(中间舍入)为2.2204e-012,与没有中间舍入相比,是双倍的$ 2 $ 1.11E-16 ^ N = 1.1102e -012。

近似的,我们可以说中间舍入加倍了错误。

如果你乘以pi 10 ^ 12倍,而且没有扩展精度的FPU。这可能是因为在继续之前(只要确保编译器没有对您的指令进行重新排序,以便没有FPU结果累积),就可以在内存中编写中间步骤(也许还需要做其他事情),然后严格限制相对上限错误是:

  2.22e-004 

请注意,小数点的置信度并不意味着它有时就是小数位。

例如,如果答案是:

1.999999999999,错误是1E-5,实际答案可能是2.000001234。

这种情况下,即使是第一个十进制数字是错误的。但这真的取决于你是多么幸运(答案是否落在这样的边界上)。




这个解决方案假设双打(包括答案​​)全部正常化。对于非规范化的结果,很明显,非规格化的数字二进制数字会降低精度达到许多数字。


What is a tight lower-bound on the size of the set of irrational numbers, N, expressed as doubles in Matlab on a 64-bit machine, that I multiply together while having confidence in k decimal digits of the product? What precision, for example could I expect after multiplying together ~10^12 doubles encoding different random chunks of pi?

解决方案

For 64 bit floating point numbers, assuming the standard IEEE 754, has 52+1 bits of mantissa.

That means relative precision is between 1.0000...0 and 1.0000...1, where the number of binary digits after the decimal point is 52. (You can think of the 1.000...0 as what is stored in binary in the mantissa AKA significand).

The error is 1/2 to the power of 52 divided by 2 (half the resolution). Note I choose the relative precision as close to 1.0 as possible, because it is the worst case (otherwise between 1.111..11 and 1.111..01, it is more precise).

In decimal, the worst case relative precision of a double is 1.11E-16.

If you multiply N doubles with this precision, the new relative precision (assuming no additional error due to intermediate rounding) is:

1 - (1 - 1.11E-16)^N

So if you multiply pi (or any double 10^12) times, the upper bound on error is:

1.1102e-004

That means you can have confidence in about 4-5 digits.

You can ignore intermediate rounding error if your CPU has support for extended precision floating point numbers for intermediate results.

If there is no extended precision FPU (floating point unit) used, rounding in intermediate steps introduces additional error (same as due to multiplication). That means that a strict lower bound calculated as:

1 -
((1 - 1.11E-16) * (1 - 1.11E-16) * (1 - 1.11E-16)
                * (1 - 1.11E-16) * (1 - 1.11E-16) % for multiplication, then rounding

               ... (another N-4 lines here) ...

                * (1 - 1.11E-16) * (1 - 1.11E-16))

= 1-(1-1.11E-16)^(N*2-1)

If N is too large, it takes too long to run. The possible error (with intermediate rounding) is 2.2204e-012, which is double compared to without intermediate rounding 1-(1 - 1.11E-16)^N=1.1102e-012.

Approximately, we can say that intermediate rounding doubles the error.

If you multiplied pi 10^12 times, and there was no extended precision FPU. This might be because you write intermediate steps to memory (and perhaps do something else), before continuing (just make sure the compiler hasn't reordered your instructions so that there is no FPU result accumulation), then a strict upper bound on your relative error is:

2.22e-004

Note that confidence in decimal places doesn't mean it will be exactly that decimal places sometimes.

For example, if the answer is:

1.999999999999, and the error is 1E-5, the actual answer could be 2.000001234.

In this case, even the first decimal digit was wrong. But that really depends on how lucky you are (whether the answer falls on a boundary such as this).


This solution assumes that the doubles (including the answer) are all normalized. For denormalized results, obviously, the number binary digits by which it is denormalized will reduce the accuracy by that many digits.

这篇关于双数乘积与大量项的正确小数位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆