Statsmodels.formula.api OLS不显示拦截的统计值 [英] Statsmodels.formula.api OLS does not show statistical values of intercept

查看:367
本文介绍了Statsmodels.formula.api OLS不显示拦截的统计值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行以下源代码:

I am running the following source code:

import statsmodels.formula.api as sm

# Add one column of ones for the intercept term
X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)

regressor_OLS = sm.OLS(endog=y, exog=X).fit()
print(regressor_OLS.summary())

其中

X是一个50x5(添加拦截项之前)的numpy数组,如下所示:

X is an 50x5 (before adding the intercept term) numpy array which looks like this:

[[0 1 165349.20 136897.80 471784.10]
 [0 0 162597.70 151377.59 443898.53]...]

y是一个50x1的numpy数组,具有因变量的浮点值.

and y is a a 50x1 numpy array with float values for the dependent variable.

前两列用于具有三个不同值的虚拟变量.其余各列是三个不同的独立变量.

The first two columns are for a dummy variable with three different values. The rest of the columns are three different indepedent variables.

不过,据说statsmodels.formula.api.OLS自动添加了一个拦截项(请参阅此处的@stellacia答案:

Although, it is said that the statsmodels.formula.api.OLS adds automatically an intercept term (see @stellacia's answer here: OLS using statsmodel.formula.api versus statsmodel.api) its summary does not show the statistical values of the intercept term as it evident below in my case:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Profit   R-squared:                       0.988
Model:                            OLS   Adj. R-squared:                  0.986
Method:                 Least Squares   F-statistic:                     727.1
Date:                Sun, 01 Jul 2018   Prob (F-statistic):           7.87e-42
Time:                        21:40:23   Log-Likelihood:                -545.15
No. Observations:                  50   AIC:                             1100.
Df Residuals:                      45   BIC:                             1110.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          3464.4536   4905.406      0.706      0.484   -6415.541    1.33e+04
x2          5067.8937   4668.238      1.086      0.283   -4334.419    1.45e+04
x3             0.7182      0.066     10.916      0.000       0.586       0.851
x4             0.3113      0.035      8.885      0.000       0.241       0.382
x5             0.0786      0.023      3.429      0.001       0.032       0.125
==============================================================================
Omnibus:                        1.355   Durbin-Watson:                   1.288
Prob(Omnibus):                  0.508   Jarque-Bera (JB):                1.241
Skew:                          -0.237   Prob(JB):                        0.538
Kurtosis:                       2.391   Cond. No.                     8.28e+05
==============================================================================

由于这个原因,我在源代码中添加了以下行:

For this reason, I added to my source code the line:

X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)

如您在我的文章开头所看到的,拦截/常数的统计值如下所示:

as you can see at the beginning of my post and the statistical values of the intercept/constant are shown as below:

 OLS Regression Results                            
==============================================================================
Dep. Variable:                 Profit   R-squared:                       0.951
Model:                            OLS   Adj. R-squared:                  0.945
Method:                 Least Squares   F-statistic:                     169.9
Date:                Sun, 01 Jul 2018   Prob (F-statistic):           1.34e-27
Time:                        20:25:21   Log-Likelihood:                -525.38
No. Observations:                  50   AIC:                             1063.
Df Residuals:                      44   BIC:                             1074.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       5.013e+04   6884.820      7.281      0.000    3.62e+04     6.4e+04
x1           198.7888   3371.007      0.059      0.953   -6595.030    6992.607
x2           -41.8870   3256.039     -0.013      0.990   -6604.003    6520.229
x3             0.8060      0.046     17.369      0.000       0.712       0.900
x4            -0.0270      0.052     -0.517      0.608      -0.132       0.078
x5             0.0270      0.017      1.574      0.123      -0.008       0.062
==============================================================================
Omnibus:                       14.782   Durbin-Watson:                   1.283
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               21.266
Skew:                          -0.948   Prob(JB):                     2.41e-05
Kurtosis:                       5.572   Cond. No.                     1.45e+06
==============================================================================

即使我说statsmodels.formula.api.OLS是自动添加该截取项,为什么当我不给自己添加截取项时也没有显示截取的统计值?

Why the statistical values of the intercept are not showing when I do not add my myself an intercept term even though it is said that statsmodels.formula.api.OLS is adding this automatically?

推荐答案

除非使用公式,否则模型不会添加任何常量." 因此,请尝试以下示例.变量名称应根据您的数据集进行定义.

"No constant is added by the model unless you are using formulas." Therefore try something like below example. Variable names should be defined according to your data set.

使用

regressor_OLS  = smf.ols(formula='Y_variable ~ X_variable', data=df).fit()

而不是

regressor_OLS = sm.OLS(endog=y, exog=X).fit()

这篇关于Statsmodels.formula.api OLS不显示拦截的统计值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆