pandas /统计模型OLS预测未来价值 [英] Pandas/Statsmodel OLS predicting future values

查看:102
本文介绍了 pandas /统计模型OLS预测未来价值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试对自己创建的模型中的未来价值进行预测.我已经在pandas和statsmodels中尝试了OLS.这是我在statsmodels中拥有的东西:

I've been trying to get a prediction for future values in a model I've created. I have tried both OLS in pandas and statsmodels. Here is what I have in statsmodels:

import statsmodels.api as sm
endog = pd.DataFrame(dframe['monthly_data_smoothed8'])
smresults = sm.OLS(dframe['monthly_data_smoothed8'], dframe['date_delta']).fit()
sm_pred = smresults.predict(endog)
sm_pred

返回的数组的长度等于我原始数据帧中的记录数,但是值不相同.当我使用熊猫执行以下操作时,没有返回任何值.

The length of the array returned is equal to the number of records in my original dataframe but the values are not the same. When I do the following using pandas I get no values returned.

from pandas.stats.api import ols
res1 = ols(y=dframe['monthly_data_smoothed8'], x=dframe['date_delta'])
res1.predict

(请注意,Pandas中没有针对OLS的.fit函数)有人可以阐明我如何从PLS中的OLS模型或statsmodel中获得未来的预测-我意识到我一定不能正确使用.predict和我已经阅读了人们遇到的其他多个问题,但这些问题似乎不适用于我的情况.

(Note that there is no .fit function for OLS in Pandas) Could somebody shed some light on how I might get future predictions from my OLS model in either pandas or statsmodel-I realize I must not be using .predict properly and I've read the multiple other problems people have had but they do not seem to apply to my case.

编辑我相信定义的"endog"是不正确的-我应该传递我要预测的值;因此,我创建的日期范围比上次记录的值晚12个时间段.但是当我遇到错误时,我仍然想念一些东西:

edit I believe 'endog' as defined is incorrect-I should be passing the values for which I want to predict; therefore I've created a date range of 12 periods past the last recorded value. But still I miss something as I am getting the error:

matrices are not aligned

编辑,这是一小段数据,数字的最后一列(红色)是日期变化量,与第一个日期相差的月份数:

edit here is a snippet of data, the last column (in red) of numbers is the date delta which is a difference in months from the first date:

month   monthly_data    monthly_data_smoothed5  monthly_data_smoothed8  monthly_data_smoothed12 monthly_data_smoothed3  date_delta
0   2011-01-31  3.711838e+11    3.711838e+11    3.711838e+11    3.711838e+11    3.711838e+11    0.000000
1   2011-02-28  3.776706e+11    3.750759e+11    3.748327e+11    3.746975e+11    3.755084e+11    0.919937
2   2011-03-31  4.547079e+11    4.127964e+11    4.083554e+11    4.059256e+11    4.207653e+11    1.938438
3   2011-04-30  4.688370e+11    4.360748e+11    4.295531e+11    4.257843e+11    4.464035e+11    2.924085

推荐答案

我认为您的问题是statsmodels在默认情况下不会添加拦截,因此您的模型无法达到理想的效果.要在您的代码中解决它,将是这样的:

I think your issue here is that statsmodels doesn't add an intercept by default, so your model doesn't achieve much of a fit. To solve it in your code would be something like this:

dframe = pd.read_clipboard() # your sample data
dframe['intercept'] = 1
X = dframe[['intercept', 'date_delta']]
y = dframe['monthly_data_smoothed8']

smresults = sm.OLS(y, X).fit()

dframe['pred'] = smresults.predict()

此外,就其价值而言,我认为statsmodel公式api在处理DataFrame时要好得多,并且默认情况下添加了一个拦截器(添加一个- 1来删除).参见下文,它应该给出相同的答案.

Also, for what it's worth, I think the statsmodel formula api is much nicer to work with when dealing with DataFrames, and adds an intercept by default (add a - 1 to remove). See below, it should give the same answer.

import statsmodels.formula.api as smf

smresults = smf.ols('monthly_data_smoothed8 ~ date_delta', dframe).fit()

dframe['pred'] = smresults.predict()

要预测未来值,只需将新数据传递给.predict()例如,使用第一个模型:

To predict future values, just pass new data to .predict() For example, using the first model:

In [165]: smresults.predict(pd.DataFrame({'intercept': 1, 
                                          'date_delta': [0.5, 0.75, 1.0]}))
Out[165]: array([  2.03927604e+11,   2.95182280e+11,   3.86436955e+11])

在截距上-数字1中没有任何编码,它仅基于OLS的数学原理(截距与始终等于1的回归变量完全相似),因此您可以从摘要中提取该值.查看statsmodels docs ,这是添加截距的另一种方法将是:

On the intercept - there's nothing encoded in the number 1 it's just based on the math of OLS (an intercept is perfectly analogous to a regressor that always equals 1), so you can pull the value right off the summary. Looking at the statsmodels docs, an alternative way to add an intercept would be:

X = sm.add_constant(X)

这篇关于 pandas /统计模型OLS预测未来价值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆