scipy.integrate.quad 大数精度 [英] scipy.integrate.quad precision on big numbers

查看:26
本文介绍了scipy.integrate.quad 大数精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试通过 scipy.integrate.quad() 计算这样的积分(实际上是指数分布的 cdf 及其 pdf):

将 numpy 导入为 np从 scipy.integrate 导入四边形定义 g(x):返回 .5 * np.exp(-.5 * x)打印四边形(g, a=0., b=np.inf)打印四边形(g, a=0., b=10**6)打印四边形(g, a=0., b=10**5)打印四边形(g, a=0., b=10**4)

结果如下:

(1.0, 3.5807346295637055e-11)(0.0, 0.0)(3.881683817604194e-22, 7.717972744764185e-22)(1.0, 1.6059202674761255e-14)

尽管使用 np.inf 解决了问题,但所有使用大的积分上限的尝试都会产生错误的答案.

类似案例在 GitHub 上的 scipy 问题 #5428 中讨论.

我应该怎么做才能避免在集成其他密度函数时出现这种错误?

解决方案

我相信这个问题是由于 np.exp(-x) 很快变得非常小,x 增加,由于数值精度有限,导致评估为零.例如,即使 x 小到 x=10**2*np.exp(-x) 的计算结果为 3.72007597602e-44,而x 顺序10**3 或以上的值导致0.

我不知道 quad 的实现细节,但它可能会在给定的集成范围内对要集成的函数执行某种采样.对于较大的积分上限,np.exp(-x) 的大多数样本评估为零,因此积分值被低估.(请注意,在这些情况下,quad 提供的绝对误差与整数值的顺序相同,后者表明后者不可靠.)

避免此问题的一种方法是将积分上限限制为一个值,高于该值,数值函数变得非常小(因此,对积分值的贡献很小).从您的代码片段来看,10**4 的值似乎是一个不错的选择,但是,10**2 的值也会导致对积分.

另一种避免数值精度问题的方法是使用在任意精度算术中执行计算的模块,例如mpmath.例如,对于 x=10**5mpmath 计算 exp(-x) 如下(使用原生的 mpmath 指数函数)

将 mpmath 导入为 mp打印(mp.exp(-10**5))

<块引用>

3.56294956530937e-43430

注意这个值有多小.使用标准硬件数值精度(由 numpy 使用),该值变为 0.

mpmath 提供积分函数 (mp.quad),它可以提供对积分上限任意值的积分的准确估计.

将 mpmath 导入为 mp打印(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, mp.inf]))打印(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**13]))打印(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**8]))打印(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**5]))

<块引用>

1.00.9999996504694740.9999999999965160.999999999999997

我们还可以通过将精度提高到例如 50 个小数点(从 15 这是标准精度)来获得更准确的估计

mp.mp.dps = 50;打印(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, mp.inf]))打印(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**13]))打印(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**8]))打印(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**5]))

<块引用>

1.00.99999999999999999999999999999999999999998298802620.99999999999999999999999999999999999999999999974630.999999999999999999999999999999999999999999999998

一般来说,获得这种精度的代价是增加了计算时间.

P.S.:不用说,如果您能够首先分析地评估您的积分(例如,在 Sympy 的帮助下),您可以忘记以上所有内容.

I try to compute such an integral (actually cdf of exponential distribution with its pdf) via scipy.integrate.quad():

import numpy as np
from scipy.integrate import quad

def g(x):
    return .5 * np.exp(-.5 * x)

print quad(g, a=0., b=np.inf)
print quad(g, a=0., b=10**6)
print quad(g, a=0., b=10**5)
print quad(g, a=0., b=10**4)

And the result is as follows:

(1.0, 3.5807346295637055e-11)
(0.0, 0.0)
(3.881683817604194e-22, 7.717972744764185e-22)
(1.0, 1.6059202674761255e-14)

All the attempts to use a big upper integration limit yield an incorrect answer though the usage of np.inf solves the problem.

Similiar case is discussed in scipy issue #5428 at GitHub.

What should I do to avoid such an error in integrating other density functions?

解决方案

I believe the issue is due to np.exp(-x) quickly becoming very small as x increases, which results in evaluating as zero due to limited numerical precision. For example, even for x as small as x=10**2*, np.exp(-x) evaluates to 3.72007597602e-44, whereas x values of order 10**3 or above result in 0.

I do not know the implementation specifics of quad, but it probably performs some kind of sampling of the function to be integrated over the given integration range. For a large upper integration limit, most of the samples of np.exp(-x) evaluate to zero, hence the integral value is underestimated. (Note that in these cases the provided absolute error by quad is of the same order as the integral value which is an indicator that the latter is unreliable.)

One approach to avoid this issue is to restrict the integration upper bound to a value above which the numerical function becomes very small (and, hence, contributes marginally to the integral value). From your code snipet, the value of 10**4 appears to be a good choice, however, a value of 10**2 also results in an accurate evaluation of the integral.

Another approach to avoid numerical precision issues is to use a module that performs computation in arbitrary precision arithmetic, such as mpmath. For example, for x=10**5, mpmath evaluates exp(-x) as follows (using the native mpmath exponential function)

import mpmath as mp
print(mp.exp(-10**5))

3.56294956530937e-43430

Note how small this value is. With the standard hardware numerical precision (used by numpy) this value becomes 0.

mpmath offers an integration function (mp.quad), which can provide an accurate estimate of the integral for arbitrary values of the upper integral bound.

import mpmath as mp

print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, mp.inf]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**13]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**8]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**5]))

1.0
0.999999650469474
0.999999999996516
0.999999999999997

We can also obtain even more accurate estimates by increasing the precision to, say, 50 decimal points (from 15 which is the standard precision)

mp.mp.dps = 50; 

print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, mp.inf]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**13]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**8]))
print(mp.quad(lambda x : .5 * mp.exp(-.5 * x), [0, 10**5]))

1.0
0.99999999999999999999999999999999999999999829880262
0.99999999999999999999999999999999999999999999997463
0.99999999999999999999999999999999999999999999999998

In general, the cost for obtaining this accuracy is an increased computation time.

P.S.: It goes without saying that if you are able to evaluate your integral analytically in the first place (e.g., with the help of Sympy) you can forget all the above.

这篇关于scipy.integrate.quad 大数精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆