具有部分识别模型的 Statsmodels [英] Statsmodels with partly identified model

查看:26
本文介绍了具有部分识别模型的 Statsmodels的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行回归,其中只能识别一些系数:

I am trying to run a regression where only some of the coefficients can be identified:

data = np.array([[2, 1, 1, 1], [1, 1, 1, 0]])
df = pd.DataFrame(data, columns=['y', 'x1', 'x2', 'x3'])
z = df.pop('y')
mod = sm.OLS(z, sm.add_constant(df))

现在,我有两个结果,在两个观察值之间变化的唯一变量是 x3.所以,我希望(因为我添加了一个常量),模型将无法识别 x1x2,并且会忽略它们.然而,它应该给我一个 1 对于 x3,因为这种效果的存在使 y 增加了 1.

Now, I have two outcomes, and the only variables that changes between the two observations is x3. So, I would expect that (since I added a constant), the model would be unable to identify x1 or x2, and would omit those. It should however give me a 1 for x3, since the presence of that effect increases y by one.

Stata 确实给了我这个结果,它提醒我它不能估计 x3 系数的标准误差.statsmodels,另一方面...

Stata does exactly give me this outcome, and it reminds me that it cannot estimate a standard error on the coefficient for x3. statsmodels, on the other hand...

res = mod.fit()
res.summary()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                       nan
Date:                Sun, 30 Aug 2020   Prob (F-statistic):                nan
Time:                        14:28:28   Log-Likelihood:                 66.947
No. Observations:                   2   AIC:                            -129.9
Df Residuals:                       0   BIC:                            -132.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.5000        inf          0        nan         nan         nan
x2             0.5000        inf          0        nan         nan         nan
x3             1.0000        inf          0        nan         nan         nan
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   0.200
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.333
Skew:                           0.000   Prob(JB):                        0.846
Kurtosis:                       1.000   Cond. No.                         3.23
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
"""

这里发生了什么?我怎样才能得到我的预期输出?

What is happening here? And how can I get my expected output?

推荐答案

statsmodels 使用 Moore-Penrose 广义逆 pinv 来估计线性回归模型 OLS 中的参数.WLS、GLS.

statsmodels uses the Moore-Penrose generalized inverse pinv to estimate the parameters in linear regression model, OLS. WLS, GLS.

因此,如果设计矩阵是奇异的,它提供了一个正则化的解决方案.

So, it provides a regularized solution if the design matrix is singular.

参数估计的协方差矩阵降低了秩,只能识别一些参数的线性组合.

The covariance matrix of the parameter estimate has reduced rank, and only some linear combinations of parameters will be identified.

但是,如果数据中的线性关系在预测样本中保持不变,则该模型可用于预测.

However, the model can be used for prediction, if the linear relationship in the data remains the same in prediction samples.

这篇关于具有部分识别模型的 Statsmodels的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆