numpy.polyfit没有关键字'cov' [英] numpy.polyfit has no keyword 'cov'

查看:125
本文介绍了numpy.polyfit没有关键字'cov'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用polyfit来找到一组数据的最佳拟合直线,但是我还需要了解参数的不确定性,因此我也想要协方差矩阵.在线文档建议我写:

I'm trying to use polyfit to find the best fitting straight line to a set of data, but I also need to know the uncertainty on the parameters, so I want the covariance matrix too. The online documentation suggests I write:

polyfit(x,y,2,cov = True)

polyfit(x, y, 2, cov=True)

但这会导致错误:

TypeError:polyfit()得到了意外的关键字参数'cov'

TypeError: polyfit() got an unexpected keyword argument 'cov'

并且肯定有足够的帮助(polyfit)没有显示关键字参数'cov'.

And sure enough help(polyfit) shows no keyword argument 'cov'.

那么在线文档是否引用了numpy的早期版本? (我有1.6.1,最新的).我可以编写自己的polyfit函数,但是有人对为什么我的polyfit没有协方差选项有任何建议吗?

So does the online documentation refer to a previous release of numpy? (I have 1.6.1, the newest one). I could write my own polyfit function, but has anyone got any suggestions for why I don't have a covariance option on my polyfit?

谢谢

推荐答案

对于来自库的解决方案,我发现使用

For a solution that comes from a library, I find that using scikits.statsmodels is a convenient choice. In statsmodels, regression objects have callable attributes that return the parameters and standard errors. I put an example of how this would work for you below:

# Imports, I assume NumPy for forming your data.
import numpy as np
import scikits.statsmodels.api as sm

# Form the data here
(X, Y) = ....

reg_x_data =  np.ones(X.shape);                      # 0th degree term. 
for ii in range(1,deg+1):
    reg_x_data = np.hstack(( reg_x_data, X**(ii) )); # Append the ii^th degree term.


# Store OLS regression results into `result`
result = sm.OLS(Y,reg_x_data).fit()


# Print the estimated coefficients
print result.params

# Print the basic OLS standard error in the coefficients
print result.bse

# Print the estimated basic OLS covariance matrix
print result.cov_params()   # <-- Notice, this one is a function by convention.

# Print the heteroskedasticity-consistent standard error
print result.HC0_se

# Print the heteroskedasticity-consistent covariance matrix
print result.cov_HC0

result对象中还有其他健壮的协方差属性.您可以通过打印dir(result)看到它们.另外,按照惯例,仅在您已经调用了相应的标准错误之后, 才可以使用异方差一致性估计量的协方差矩阵,例如:您必须在result.cov_HC0之前调用result.HC0_se,因为第一个引用会导致第二个被计算并存储.

There are additional robust covariance attributes in the result object as well. You can see them by printing out dir(result). Also, by convention, the covariance matrices for the heteroskedasticity-consistent estimators are only available after you already call the corresponding standard error, such as: you must call result.HC0_se prior to result.cov_HC0 because the first reference causes the second one to be computed and stored.

熊猫是另一个可能为这些操作提供更高级支持的库.

Pandas is another library that probably provides more advanced support for these operations.

非库功能

当您不想/不能依赖额外的库函数时,这可能很有用.

This might be useful when you don't want to / can't rely on an extra library function.

下面是我编写的用于返回OLS回归系数的函数,以及一堆东西.它返回残差,回归方差和标准误差(残差平方的标准误差),大样本方差的渐近公式,OLS协方差矩阵,异方差一致的稳健"协方差估计值(即OLS协方差)但根据残差进行加权),以及白色"或经偏差校正"的异方差一致性协方差.

Below is a function that I wrote to return the OLS regression coefficients, as well as a bunch of stuff. It returns the residuals, the regression variance and standard error (standard error of the residuals-squared), the asymptotic formula for large-sample variance, the OLS covariance matrix, the heteroskedasticity-consistent "robust" covariance estimate (which is the OLS covariance but weighted according to the residuals), and the "White" or "bias-corrected" heteroskedasticity-consistent covariance.

import numpy as np

###
# Regression and standard error estimation functions
###
def ols_linreg(X, Y):
    """ ols_linreg(X,Y) 

        Ordinary least squares regression estimator given explanatory variables 
        matrix X and observations matrix Y.The length of the first dimension of 
        X and Y must be the same (equal to the number of samples in the data set).

        Note: these methods should be adapted if you need to use this for large data.
        This is mostly for illustrating what to do for calculating the different
        classicial standard errors. You would never really want to compute the inverse
        matrices for large problems.

        This was developed with NumPy 1.5.1.
    """
    (N, K) = X.shape
    t1 = np.linalg.inv( (np.transpose(X)).dot(X) )
    t2 = (np.transpose(X)).dot(Y)

    beta = t1.dot(t2)
    residuals = Y - X.dot(beta)
    sig_hat = (1.0/(N-K))*np.sum(residuals**2)
    sig_hat_asymptotic_variance = 2*sig_hat**2/N
    conv_st_err = np.sqrt(sig_hat)

    sum1 = 0.0
    for ii in range(N):
        sum1 = sum1 + np.outer(X[ii,:],X[ii,:])

    sum1 = (1.0/N)*sum1
    ols_cov = (sig_hat/N)*np.linalg.inv(sum1)

    PX = X.dot(  np.linalg.inv(np.transpose(X).dot(X)).dot(np.transpose(X))   )
    robust_se_mat1 = np.linalg.inv(np.transpose(X).dot(X))
    robust_se_mat2 = np.transpose(X).dot(np.diag(residuals[:,0]**(2.0)).dot(X))
    robust_se_mat3 = np.transpose(X).dot(np.diag(residuals[:,0]**(2.0)/(1.0-np.diag(PX))).dot(X))

    v_robust = robust_se_mat1.dot(robust_se_mat2.dot(robust_se_mat1))
    v_modified_robust = robust_se_mat1.dot(robust_se_mat3.dot(robust_se_mat1))

    """ Returns:
        beta -- The vector of coefficient estimates, ordered on the columns on X.
        residuals -- The vector of residuals, Y - X.beta
        sig_hat -- The sample variance of the residuals.
        conv_st_error -- The 'standard error of the regression', sqrt(sig_hat).
        sig_hat_asymptotic_variance -- The analytic formula for the large sample variance
        ols_cov -- The covariance matrix under the basic OLS assumptions.
        v_robust -- The "robust" covariance matrix, weighted to account for the residuals and heteroskedasticity.
        v_modified_robust -- The bias-corrected and heteroskedasticity-consistent covariance matrix.
    """
    return beta, residuals, sig_hat, conv_st_err, sig_hat_asymptotic_variance, ols_cov, v_robust, v_modified_robust

对于您的问题,您可以像这样使用它:

For your problem, you would use it like this:

import numpy as np

# Define or load your data:
(Y, X) = ....

# Desired polynomial degree
deg = 2;

reg_x_data =  np.ones(X.shape); # 0th degree term. 
for ii in range(1,deg+1):
    reg_x_data = np.hstack(( reg_x_data, X**(ii) )); # Append the ii^th degree term.


# Get all of the regression data.
beta, residuals, sig_hat, conv_st_error, sig_hat_asymptotic_variance, ols_cov, v_robust, v_modified_robust = ols_linreg(reg_x_data,Y)

# Print the covariance matrix:
print ols_cov

如果您发现我的计算中有任何错误(尤其是异方差一致性估计量),请告诉我,我将尽快修复.

If you spot any bugs in my computations (especially the heteroskedasticity-consistent estimators) please let me know and I'll fix it asap.

这篇关于numpy.polyfit没有关键字'cov'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆