是否有任何有关numpy数值稳定性的文档? [英] Is there any documentation of numpy numerical stability?

查看:111
本文介绍了是否有任何有关numpy数值稳定性的文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我环顾了一些有关numpy/scipy函数在数值稳定性方面的表现的文档,例如是采取什么措施来提高数值稳定性,还是有替代的稳定实现方式.

我对浮点数组numpy.sum()numpy.cumsum()numpy.dot()的附加内容(+运算符)特别感兴趣.在所有情况下,我本质上都是在总结大量的浮点数,并且担心这种计算的准确性.

有人知道numpy/scipy文档或其他来源中对此类问题的引用吗?

解决方案

短语稳定性"是指一种算法.如果您的算法从一开始就不稳定,那么提高精度或减少组成步骤的舍入误差将不会带来太大的好处.

更复杂的numpy例程(例如"solve")是ATLAS/BLAS/LAPACK例程的包装器.您可以在此处参考文档,例如,"dgesv"使用具有部分枢轴和行互换的LU分解来求解实值线性方程组:有关LAPACK的基础Fortran代码文档,请参见

对于求和"或点"函数期间的累积舍入,您可以选择:

在Linux 64位上,numpy提供了对高精度长双精度"类型float128的访问,您可以使用它来减少中间计算的精度损失,而这会降低性能和内存.

但是,在我的Win64上,将"numpy.longdouble"映射到普通C double类型的"numpy.float64",因此您的代码不是跨平台的,请选中"finfo". (Canopy Express Win64上不存在真正更高精度的float96或float128)

log2(finfo(float64).resolution)
> -49.828921423310433
actually 53-bits of mantissa internally ~ 16 significant decimal figures

log2(finfo(float32).resolution)
> -19.931568                          # ~ only 7 meaningful digits

由于sum()dot()将数组缩减为单个值,因此使用内置函数可以轻松实现最大精度:

x = arange(1, 1000000, dtype = float32)
y = map(lambda z : float32(1.0/z), arange(1, 1000000))
sum(x)                     #  4.9994036e+11
sum(x, dtype = float64)    #  499999500000.0
sum(y)                     #  14.357357
sum(y, dtype = float64)    #  14.392725788474309
dot(x,y)                             # 999999.0
einsum('i,i', x, y)                  # * dot product = 999999.0
einsum('i,i', x, y, dtype = float64) # 999999.00003965141

  • 请注意,在这种情况下,点"内的单精度舍入会取消,因为每个几乎整数都将舍入为一个精确的整数

优化舍入取决于您要累加的种类-首先添加许多小数可以帮助延迟舍入,但不能避免存在大数的问题,但由于中间计算仍会导致精度损失而相互抵消 >

示例显示评估顺序依赖性...

x = array([ 1., 2e-15, 8e-15 , -0.7, -0.3], dtype=float32)
# evaluates to
# array([  1.00000000e+00,   2.00000001e-15,   8.00000003e-15,
#   -6.99999988e-01,  -3.00000012e-01], dtype=float32)
sum(x) # 0
sum(x,dtype=float64) # 9.9920072216264089e-15
sum(random.permutation(x)) # gives 9.9999998e-15 / 2e-15 / 0.0

I looked around for some documentation of how numpy/scipy functions behave in terms of numerical stability, e.g. are any means taken to improve numerical stability or are there alternative stable implementations.

I am specifically interested in addition (+ operator) of floating point arrays, numpy.sum(), numpy.cumsum() and numpy.dot(). In all cases I am essentially summing a very large quantity of floating points numbers and I am concerned about the accuracy of such calculations.

Does anyone know of any reference to such issues in the numpy/scipy documentation or some other source?

解决方案

The phrase "stability" refers to an algorithm. If your algorithm is unstable to start with then increasing precision or reducing rounding error of the component steps is not going to gain much.

The more complex numpy routines like "solve" are wrappers for the ATLAS/BLAS/LAPACK routines. You can refer to documentation there, for example "dgesv" solves a system of real valued linear equations using an LU decomposition with partial pivoting and row interchanges : underlying Fortran code docs for LAPACK can be seen here http://www.netlib.org/lapack/explore-html/ but http://docs.scipy.org/doc/numpy/user/install.html points out that many different versions of the standard routine implementations are available - speed optimisation and precision will vary between them.

Your examples don't introduce much rounding, "+" has no unnecessary rounding, the precision depends purely on rounding implicit in the floating point datatype when the smaller number has low-order bits that cannot be represented in an answer. Sum and dot depend only on the order of evaluation. Cumsum cannot be easily re-ordered as it outputs an array.

For the cumulative rounding during a "cumsum" or "dot" function you do have choices:

On Linux 64bit numpy provides access to a high precision "long double" type float128 which you could use to reduce loss of precision in intermediate calculations at the cost of performance and memory.

However on my Win64 install "numpy.longdouble" maps to "numpy.float64" a normal C double type so your code is not cross-platform, check "finfo". (Neither float96 or float128 with genuinely higher precision exist on Canopy Express Win64)

log2(finfo(float64).resolution)
> -49.828921423310433
actually 53-bits of mantissa internally ~ 16 significant decimal figures

log2(finfo(float32).resolution)
> -19.931568                          # ~ only 7 meaningful digits

Since sum() and dot() reduce the array to a single value, maximising precision is easy with built-ins:

x = arange(1, 1000000, dtype = float32)
y = map(lambda z : float32(1.0/z), arange(1, 1000000))
sum(x)                     #  4.9994036e+11
sum(x, dtype = float64)    #  499999500000.0
sum(y)                     #  14.357357
sum(y, dtype = float64)    #  14.392725788474309
dot(x,y)                             # 999999.0
einsum('i,i', x, y)                  # * dot product = 999999.0
einsum('i,i', x, y, dtype = float64) # 999999.00003965141

  • note the single precision roundings within "dot" cancel in this case as each almost-integer is rounded to an exact integer

Optimising rounding depends on the kind of thing you are adding up - adding many small numbers first can help delay rounding but would not avoid problems where big numbers exist but cancel each other out as intermediate calculations still cause a loss of precision

example showing evaluation order dependence ...

x = array([ 1., 2e-15, 8e-15 , -0.7, -0.3], dtype=float32)
# evaluates to
# array([  1.00000000e+00,   2.00000001e-15,   8.00000003e-15,
#   -6.99999988e-01,  -3.00000012e-01], dtype=float32)
sum(x) # 0
sum(x,dtype=float64) # 9.9920072216264089e-15
sum(random.permutation(x)) # gives 9.9999998e-15 / 2e-15 / 0.0

这篇关于是否有任何有关numpy数值稳定性的文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆