使用统计模型进行预测 [英] Forecasting with statsmodels

查看:127
本文介绍了使用统计模型进行预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.csv文件,其中包含5年时间序列,每小时的分辨率(商品价格).根据历史数据,我想对第6年的价格进行预测.

I have a .csv file containing a 5-year time series, with hourly resolution (commoditiy price). Based on the historical data, I want to create a forecast of the prices for the 6th year.

由于我对Python(尤其是statsmodels)和统计知识的了解最多,因此,我已经在www上阅读了几篇有关此类过程的文章,并且我的代码基本上基于此处发布的代码.

I have read a couple of articles on the www about these type of procedures, and I basically based my code on the code posted there, since my knowledge in both Python (especially statsmodels) and statistic is at most limited.

这些是那些有兴趣的人的链接:

Those are the links, for those who are interested:

http://www .seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/

http://www .johnwittenauer.net/a-simple-time-series-analysis-of-the-sp-500-index/

首先,这是.csv文件的示例.在这种情况下,数据是以每月分辨率显示的,不是真正的数据,只是在此处随机选择一个数字作为示例(在这种情况下,我希望一年足够能够对第二年进行预测;否则,完整的csv文件可用):

First of all, here is a sample of the .csv file. Data is displayed with monthly resolution in this case, it is not real data, just randomly choosen numbers to give an example here (in which case I hope one year is enough to be able to develop a forecast for the 2nd year; if not, full csv file is available):

              Price
2011-01-31    32.21
2011-02-28    28.32
2011-03-31    27.12
2011-04-30    29.56
2011-05-31    31.98
2011-06-30    26.25
2011-07-31    24.75
2011-08-31    25.56
2011-09-30    26.68
2011-10-31    29.12
2011-11-30    33.87
2011-12-31    35.45

我当前的进度如下:

在读取输入文件并将date列设置为datetime索引之后,使用以下脚本为可用数据制定了预测

After reading the input file and setting the date column as datetime index, the follwing script was used to develop a forecast for the available data

model = sm.tsa.ARIMA(df['Price'].iloc[1:], order=(1, 0, 0))  
results = model.fit(disp=-1)  
df['Forecast'] = results.fittedvalues  
df[['Price', 'Forecast']].plot(figsize=(16, 12))  

,给出以下输出:

现在,正如我所说,我没有统计技能,我几乎也不知道如何获得此输出(基本上,更改第一行中的order属性会更改输出),但是实际的预测看起来还不错,我想再延长一年(2016年).

Now, as I said, I ain't got no statistic skills and I have little to no idea how I got to this output (basically, changing the order attribute inside the first line changes the output), but the 'actual' forecast looks quite good and I would like to extend it for another year (2016).

为此,将在数据框中创建其他行,如下所示:

In order to do that, additional rows are created in the dataframe, as follows:

start = datetime.datetime.strptime("2016-01-01", "%Y-%m-%d")
date_list = pd.date_range('2016-01-01', freq='1D', periods=366)
future = pd.DataFrame(index=date_list, columns= df.columns)
data = pd.concat([df, future])

最后,当我使用statsmodels的.predict函数时:

Finally, when I use the .predict function of statsmodels:

data['Forecast'] = results.predict(start = 1825, end = 2192, dynamic= True)  
data[['Price', 'Forecast']].plot(figsize=(12, 8))

我得到的预测是一条直线(见下文),这似乎根本不像预测.此外,如果我将范围从现在的1825日扩展到2192日(2016年),扩展到整个6年时间跨度,则预测线是整个时期(2011-2016年)的直线.

what I get as forecast is a straight line (see below), which doesn't seem at all like a forecast. Moreover, if I extend the range, which now is from the 1825th to 2192nd day (year of 2016), to the whole 6 year timespan, the forecast line is a straight line for the entire period (2011-2016).

我也尝试过使用'statsmodels.tsa.statespace.sarimax.SARIMAX.predict'方法,该方法考虑了季节性变化(在这种情况下很有意义),但是我对'module'有一些错误没有属性"SARIMAX".但这是第二个问题,如果需要的话,将进行更详细的介绍.

I have also tried to use the 'statsmodels.tsa.statespace.sarimax.SARIMAX.predict' method, which accounts for a seasonal variation (which makes sense in this case), but I get some error about 'module' has no attribute 'SARIMAX'. But this is secondary problem, will get into more detail if needed.

在某个地方,我失去了控制力,我不知道在哪里.谢谢阅读.干杯!

Somewhere I am losing grip and I have no idea where. Thanks for reading. Cheers!

推荐答案

听起来您使用的是不支持SARIMAX的较旧版本的statsmodels.您需要安装最新发布的版本0.8.0,请参见 http://statsmodels.sourceforge. net/devel/install.html .

It sounds like you are using an older version of statsmodels that does not support SARIMAX. You'll want to install the latest released version 0.8.0 see http://statsmodels.sourceforge.net/devel/install.html.

我正在使用Anaconda并通过pip安装.

I'm using Anaconda and installed via pip.

pip install -U statsmodels

SARIMAX模型的结果类具有许多有用的方法,包括预测.

The results class from the SARIMAX model have a number of useful methods including forecast.

data['Forecast'] = results.forecast(100)

将使用您的模型预测未来的100个步骤.

Will use your model to forecast 100 steps into the future.

这篇关于使用统计模型进行预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆