numpy.polyfit 给出了有用的拟合,但协方差矩阵是无限的 [英] numpy.polyfit gives useful fit, but infinite covariance matrix

查看:89
本文介绍了numpy.polyfit 给出了有用的拟合,但协方差矩阵是无限的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将多项式拟合到一组数据中.有时可能会发生 numpy.ployfit 返回的协方差矩阵仅由 inf 组成,尽管拟合似乎很有用.数据中没有 numpy.inf 或 'numpy.nan'!

示例:

将 numpy 导入为 np# 样本数据,不包含真正类似 x**2 的行为,# 但这应该在拟合结果中可见x = [-449., -454., -459., -464., -469.]y = [ 0.9677024, 0.97341953, 0.97724978, 0.98215678, 0.9876293]拟合,cov = np.polyfit(x, y, 2, cov=True)打印'适合:',适合打印 'cov:', cov

结果:

适合:[ 1.67867158e-06 5.69199547e-04 8.85146009e-01]cov: [[ inf inf inf][inf inf inf][inf inf inf]]

np.cov(x,y) 给出

[[ 6.25000000e+01 -6.07388099e-02][-6.07388099e-02 5.92268942e-05]]

所以 np.covnp.polyfit 返回的协方差是不一样的.有人知道发生了什么吗?

我现在明白 numpy.cov 不是我想要的.我需要多项式系数的方差,但如果 (len(x) - order - 2.0) == 0,我不会得到它们.有没有其他方法可以得到拟合多项式系数的方差?

解决方案

正如 rustil 的回答所说,这是由应用于协方差方程分母的偏差校正引起,这会导致此输入的除数为零.此更正背后的原因类似于 贝塞尔更正 背后的原因.这确实表明数据点太少,无法以明确定义的方式估计协方差.

如何解决这个问题?好吧,这个版本的 polyfit 接受 权重.您可以添加另一个数据点,但在 epsilon 中对其进行加权.这相当于减少 2.0"nofollow noreferrer">这个公式1.0.

x = [-449., -454., -459., -464., -469.]y = [ 0.9677024, 0.97341953, 0.97724978, 0.98215678, 0.9876293]x_extra = x + x[-1:]y_extra = y + y[-1:]权重 = [1.0, 1.0, 1.0, 1.0, 1.0, sys.float_info.epsilon]拟合,cov = np.polyfit(x, y, 2, cov=True)fit_extra, cov_extra = np.polyfit(x_extra, y_extra, 2, w=weights, cov=True)打印适合 == fit_extra打印 cov_extra

输出.请注意,拟合值是相同的:

<预><代码>>>>打印适合 == fit_extra[真真真假]>>>打印 cov_extra[[ 8.84481850e-11 8.11954338e-08 1.86299297e-05][ 8.11954338e-08 7.45405039e-05 1.71036963e-02][ 1.86299297e-05 1.71036963e-02 3.92469307e+00]]

非常不确定这是否特别有意义,但这是解决问题的一种方法.不过,这有点杂乱无章.对于更强大的东西,您可以修改 polyfit 以接受它自己的 ddof 参数,也许代替 cov 当前接受的布尔值.(我只是打开了一个问题来提出尽可能多的建议.)

关于 cov 计算的简短最后说明:如果您查看 最小二乘回归,你会看到系数协方差的简化公式是inv(dot(dot(X, W), X)),其中有一个 对应的行 在 numpy 代码中——至少粗略地说.在这种情况下,X范德蒙矩阵,权重已经乘以.numpy 代码也做了一些缩放(我理解;这是最小化数值误差的策略的一部分)并将结果乘以残差的范数(我不明白;我只能猜测它是另一个版本的一部分协方差公式).

I am trying to fit a polynomial to a set of data. Sometimes it may happen that the covariance matrix returned by numpy.ployfit only consists of inf, although the fit seems to be useful. There are no numpy.inf or 'numpy.nan' in the data!

Example:

import numpy as np
# sample data, does not contain really x**2-like behaviour, 
# but that should be visible in the fit results
x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024,   0.97341953,  0.97724978,  0.98215678,  0.9876293]

fit, cov = np.polyfit(x, y, 2, cov=True)

print 'fit: ', fit
print 'cov: ', cov

Result:

fit: [  1.67867158e-06   5.69199547e-04   8.85146009e-01]
cov: [[ inf  inf  inf]
      [ inf  inf  inf]
      [ inf  inf  inf]]

np.cov(x,y) gives

[[  6.25000000e+01  -6.07388099e-02]
 [ -6.07388099e-02   5.92268942e-05]]

So np.cov is not the same as the covariance returned from np.polyfit. Has anybody an idea what's going on?

EDIT: I now got the point that numpy.cov is not what I want. I need the variances of the polynom coefficients, but I dont get them if (len(x) - order - 2.0) == 0. Is there another way to get the variances of the fit polynom coefficients?

解决方案

As rustil's answer says, this is caused by the bias correction applied to the denominator of the covariance equation, which results in a zero divide for this input. The reasoning behind this correction is similar to that behind Bessel's Correction. This is really a sign that there are too few datapoints to estimate covariance in a well-defined way.

How to skirt this problem? Well, this version of polyfit accepts weights. You could add another datapoint but weight it at epsilon. This is equivalent to reducing the 2.0 in this formula to a 1.0.

x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024,   0.97341953,  0.97724978,  0.98215678,  0.9876293]

x_extra = x + x[-1:]
y_extra = y + y[-1:]
weights = [1.0, 1.0, 1.0, 1.0, 1.0, sys.float_info.epsilon]

fit, cov = np.polyfit(x, y, 2, cov=True)
fit_extra, cov_extra = np.polyfit(x_extra, y_extra, 2, w=weights, cov=True)

print fit == fit_extra
print cov_extra

The output. Note that the fit values are identical:

>>> print fit == fit_extra
[ True  True  True]
>>> print cov_extra
[[  8.84481850e-11   8.11954338e-08   1.86299297e-05]
 [  8.11954338e-08   7.45405039e-05   1.71036963e-02]
 [  1.86299297e-05   1.71036963e-02   3.92469307e+00]]

I am very uncertain that this will be especially meaningful, but it's a way to work around the problem. It's a bit of a kludge though. For something more robust, you could modify polyfit to accept its own ddof parameter, perhaps in lieu of the boolean that cov currently accepts. (I just opened an issue to suggest as much.)

A quick final note about the calculation of cov: If you look at the wikipedia page on least squares regression, you'll see that the simplified formula for the covariance of the coefficients is inv(dot(dot(X, W), X)), which has a corresponding line in the numpy code -- at least roughly speaking. In this case, X is the Vandermonde matrix, and the weights have already been multiplied in. The numpy code also does some scaling (which I understand; it's part of a strategy to minimize numerical error) and multiplies the result by the norm of the residuals (which I don't understand; I can only guess that it's part of another version of the covariance formula).

这篇关于numpy.polyfit 给出了有用的拟合,但协方差矩阵是无限的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆