将statsmodels摘要对象转换为Pandas Dataframe [英] Converting statsmodels summary object to Pandas Dataframe

查看:254
本文介绍了将statsmodels摘要对象转换为Pandas Dataframe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Windows 10上使用statsmodels.formula.api(版本0.9.0)进行多元线性回归.在拟合模型并使用以下几行获取摘要后,我得到了摘要对象格式的摘要.

I am doing multiple linear regression with statsmodels.formula.api (ver 0.9.0) on Windows 10. After fitting the model and getting the summary with following lines i get summary in summary object format.

X_opt  = X[:, [0,1,2,3]]
regressor_OLS = sm.OLS(endog= y, exog= X_opt).fit()
regressor_OLS.summary()


                          OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.951
Model:                            OLS   Adj. R-squared:                  0.948
Method:                 Least Squares   F-statistic:                     296.0
Date:                Wed, 08 Aug 2018   Prob (F-statistic):           4.53e-30
Time:                        00:46:48   Log-Likelihood:                -525.39
No. Observations:                  50   AIC:                             1059.
Df Residuals:                      46   BIC:                             1066.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       5.012e+04   6572.353      7.626      0.000    3.69e+04    6.34e+04
x1             0.8057      0.045     17.846      0.000       0.715       0.897
x2            -0.0268      0.051     -0.526      0.602      -0.130       0.076
x3             0.0272      0.016      1.655      0.105      -0.006       0.060
==============================================================================
Omnibus:                       14.838   Durbin-Watson:                   1.282
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               21.442
Skew:                          -0.949   Prob(JB):                     2.21e-05
Kurtosis:                       5.586   Cond. No.                     1.40e+06
==============================================================================

我想对显着性水平为0.05的P值进行向后消除.为此,我需要删除具有最高P值的预测变量,然后再次运行代码.

I want to do backward elimination for P values for significance level 0.05. For this i need to remove the predictor with highest P values and run the code again.

我想知道是否有一种方法可以从摘要对象中提取P值,以便我可以使用条件语句运行循环并查找有效变量,而无需手动重复这些步骤.

I wanted to know if there is a way to extract the P values from the summary object, so that i can run a loop with conditional statement and find the significant variables without repeating the steps manually.

谢谢.

推荐答案

@Michael B的答案很好,但需要重新创建"表.该表本身实际上可以从summary().tables属性直接获得.此属性中的每个表(表的列表)都是 SimpleTable ,其中包含用于输出不同格式的方法.然后,我们可以将这些格式中的任何一种读为pd.DataFrame:

The answer from @Michael B works well, but requires "recreating" the table. The table itself is actually directly available from the summary().tables attribute. Each table in this attribute (which is a list of tables) is a SimpleTable, which has methods for outputting different formats. We can then read any of those formats back as a pd.DataFrame:

import statsmodels.api as sm

model = sm.OLS(y,x)
results = model.fit()
results_summary = results.summary()

# Note that tables is a list. The table at index 1 is the "core" table. Additionally, read_html puts dfs in a list, so we want index 0
results_as_html = results_summary.tables[1].as_html()
pd.read_html(results_as_html, header=0, index_col=0)[0]

这篇关于将statsmodels摘要对象转换为Pandas Dataframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆