Python statsmodels中缺少OLS回归模型的拦截 [英] Missing intercepts of OLS Regression models in Python statsmodels

查看：178 发布时间：2020/5/18 19:13:18 python python-3.x numpy scikit-learn statsmodels

本文介绍了Python statsmodels中缺少OLS回归模型的拦截的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在对此中找到的数据集的100个窗口OLS regression estimation进行滚动链接( https://drive.google.com/drive/folders/0B2Iv8dfU4fTUMVFyYTEtWXlzYkk )，格式如下.

I am running a rolling for example of 100 window OLS regression estimation of the dataset found in this link (https://drive.google.com/drive/folders/0B2Iv8dfU4fTUMVFyYTEtWXlzYkk) as in the following format.

 time     X   Y
0.000543  0  10
0.000575  0  10
0.041324  1  10
0.041331  2  10
0.041336  3  10
0.04134   4  10
  ...
9.987735  55 239
9.987739  56 239
9.987744  57 239
9.987749  58 239
9.987938  59 239

数据集中的第三列(Y)是我的真实值-这就是我想要预测(估计)的值.我想进行Y的预测(即根据X的前3个滚动值预测Y的当前值.为此，我使用statsmodels进行以下python脚本工作.

The third column (Y) in my dataset is my true value - that's what I wanted to predict (estimate). I want to do a prediction of Y (i.e. predict the current value of Y according to the previous 3 rolling values of X. For this, I have the following python script work using statsmodels.

# /usr/bin/python -tt
import pandas as pd
import numpy as np
import statsmodels.api as sm


df=pd.read_csv('estimated_pred.csv')    
df=df.dropna() # to drop nans in case there are any
window = 100
#print(df.index) # to print index
df['a']=None #constant
df['b1']=None #beta1
df['b2']=None #beta2
for i in range(window,len(df)):
    temp=df.iloc[i-window:i,:]
    RollOLS=sm.OLS(temp.loc[:,'Y'],sm.add_constant(temp.loc[:,['time','X']], has_constant = 'add')).fit()
    df.iloc[i,df.columns.get_loc('a')]=RollOLS.params[0]
    df.iloc[i,df.columns.get_loc('b1')]=RollOLS.params[1]
    df.iloc[i,df.columns.get_loc('b2')]=RollOLS.params[2]

# Predicted values in a row
 df['predicted']=df['a'].shift(1)+df['b1'].shift(1)*df['time']+df['b2'].shift(1)*df['X']

#print(df['predicted'])

print(temp)

这给了我以下格式的示例输出.

Which gives me a sample output of the following format.

         time   X   Y        a           b1           b2  predicted
0    0.000543   0  10     None         None         None       NaN
1    0.000575   0  10     None         None         None       NaN
2    0.041324   1  10     None         None         None       NaN
3    0.041331   2  10     None         None         None       NaN
4    0.041336   3  10     None         None         None       NaN
..        ...  ..  ..      ...          ...          ...       ...
50    0.041340   4  10       10            0  1.55431e-15       NaN
51    0.041345   5  10       10   1.7053e-13  7.77156e-16        10
52    0.041350   6  10       10  1.74623e-09 -7.99361e-15        10
53    0.041354   7  10       10  6.98492e-10 -6.21725e-15        10
..        ...  ..  ..      ...          ...          ...       ...
509  0.160835  38  20       20  4.88944e-09 -1.15463e-14        20
510  0.160839  39  20       20  1.86265e-09  5.32907e-15        20
..        ...  ..  ..      ...          ...          ...       ...

最后，我想包括所有预测值(OLS回归分析的摘要)的均方误差(MSE).例如，如果我们查看第5行，则X的值为2，而Y的值为10.假设当前行中的y的预测值为6，因此mse将是(10-6)^2.当我们执行print (RollOLS.summary())时，sm.OLS返回此类<class 'statsmodels.regression.linear_model.OLS'>的实例.

Finally, I want to include the mean squared error (MSE) for all the prediction (a summary of the OLS regression analysis) values. For example, if we look at row 5, the value of X is 2 and the value of Y is 10. Let's say the prediction value of y at the current row is 6 and therefore the mse will be (10-6)^2. The sm.OLS returns an instance of this class <class 'statsmodels.regression.linear_model.OLS'> when we do print (RollOLS.summary()).

OLS Regression Results                            
==============================================================================
Dep. Variable:                      Y   R-squared:                        -inf
Model:                            OLS   Adj. R-squared:                   -inf
Method:                 Least Squares   F-statistic:                    -48.50
Date:                Tue, 04 Jul 2017   Prob (F-statistic):               1.00
Time:                        22:19:18   Log-Likelihood:                 2359.7
No. Observations:                 100   AIC:                            -4713.
Df Residuals:                      97   BIC:                            -4706.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const        239.0000   2.58e-09   9.26e+10      0.000       239.000   239.000
time        4.547e-13   2.58e-10      0.002      0.999     -5.12e-10  5.13e-10
X          -3.886e-16    1.1e-13     -0.004      0.997     -2.19e-13  2.19e-13
==============================================================================
Omnibus:                       44.322   Durbin-Watson:                   0.000
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               86.471
Skew:                          -1.886   Prob(JB):                     1.67e-19
Kurtosis:                       5.556   Cond. No.                     9.72e+04
==============================================================================

但是rsquared(print (RollOLS.rsquared))的值应该在0和1之间，而不是-inf之间，这似乎是missing intercepts的问题.如果我们想打印mse，我们按照

But the value of rsquared(print (RollOLS.rsquared)), for example, should have been between 0 and 1 instead of -inf and this seems to be the issue with missing intercepts. If we want to print the mse, we do print (RollOLS.mse_model)... etc as per the documentation. How can we add the intercepts and print the regression statistics with the correct values as we do for the predicted values? What am I doing wrong in here? Or is there another way of doing this using scikit-learnlibraries?

Python statsmodels中缺少OLS回归模型的拦截 [英] Missing intercepts of OLS Regression models in Python statsmodels

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python statsmodels中缺少OLS回归模型的拦截 [英] Missing intercepts of OLS Regression models in Python statsmodels

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭