如果参数完全适合,为什么"curve_fit"不能估计参数的协方差? [英] Why isn't `curve_fit` able to estimate the covariance of the parameter if the parameter fits exactly?
问题描述
我不了解curve_fit
无法估计参数的协方差,因此提高了下面的OptimizeWarning
.以下MCVE解释了我的问题:
I don't understand curve_fit
isn't able to estimate the covariance of the parameter, thus raising the OptimizeWarning
below. The following MCVE explains my problem:
MCVE python代码段
from scipy.optimize import curve_fit
func = lambda x, a: a * x
popt, pcov = curve_fit(f = func, xdata = [1], ydata = [1])
print(popt, pcov)
输出
\python-3.4.4\lib\site-packages\scipy\optimize\minpack.py:715:
OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
[ 1.] [[ inf]]
对于a = 1
,该函数完全适合xdata
和ydata
.为什么误差/方差0
或接近0
的东西却不是,而是inf
?
For a = 1
the function fits xdata
and ydata
exactly. Why isn't the error/variance 0
, or something close to 0
, but inf
instead?
There is this quote from the curve_fit
SciPy Reference Guide:
如果解中的雅可比矩阵不具有完整的秩,则"lm"方法返回一个填充有np.inf的矩阵,另一方面,"trf"和"dogbox"方法使用Moore-Penrose伪逆来计算协方差矩阵.
If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with np.inf, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix.
那么,潜在的问题是什么?为什么解决方案中的雅可比矩阵不具有完整等级?
So, what's the underlying problem? Why doesn't the Jacobian matrix at the solution have a full rank?
推荐答案
参数协方差的公式(维基百科)的分母中包含自由度的数量.自由度的计算方式为(数据点数)-(参数数),在您的示例中为1-1 = 0.而这是SciPy检查除以自由度之前的自由度数.
The formula for the covariance of the parameters (Wikipedia) has the number of degrees of freedom in the denominator. The degrees of freedoms are computed as (number of data points) - (number of parameters), which is 1 - 1 = 0 in your example. And this is where SciPy checks the number of degrees of freedom before dividing by it.
使用xdata = [1, 2], ydata = [1, 2]
,您将获得零协方差(请注意,模型仍然完全适合:完全适合不是问题).
With xdata = [1, 2], ydata = [1, 2]
you would get zero covariance (note that the model still fits exactly: exact fit is not the problem).
与样本方差如果样本大小N为1(样本方差的公式的分母为(N-1))未定义,这是相同的问题.如果仅从总体中抽取大小= 1的样本,则不会将方差估计为零,我们对方差一无所知.
This is the same sort of issue as sample variance being undefined if the sample size N is 1 (the formula for sample variance has (N-1) in the denominator). If we only took size=1 sample out of the population, we don't estimate the variance by zero, we know nothing about the variance.
这篇关于如果参数完全适合,为什么"curve_fit"不能估计参数的协方差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!