如何正确设置statsmodels.tsa.ar_model.AR.predict函数的开始/结束参数 [英] How to properly set start/end params of statsmodels.tsa.ar_model.AR.predict function

查看:973
本文介绍了如何正确设置statsmodels.tsa.ar_model.AR.predict函数的开始/结束参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个不规则间隔时间序列的项目成本数据框,我想尝试应用解决方案

因此,我正在创建一个每日索引来说明等间隔的时间序列需求,但是它仍然是唯一的(@ user333700的评论).

我添加了groupby函数以将重复的日期加在一起,然后可以使用datetime对象(由@ andy-hayden回答)运行predict函数.

df = df.groupby(pd.TimeGrouper(freq='D')).sum()
...
ar_res.predict(start=min(df.index), end=datetime(2018,12,31))

通过predict函数提供结果,我现在能够分析结果并调整参数以获得有用的东西.

I have a dataframe of project costs from an irregularly spaced time series that I would like to try to apply the statsmodel AR model against.

This is a sample of the data in it's dataframe:

               cost
date               
2015-07-16    35.98
2015-08-11    25.00
2015-08-11    43.94
2015-08-13    26.25
2015-08-18    15.38
2015-08-24    77.72
2015-09-09    40.00
2015-09-09    20.00
2015-09-09    65.00
2015-09-23    70.50
2015-09-29    59.00
2015-11-03    19.25
2015-11-04    19.97
2015-11-10    26.25
2015-11-12    19.97
2015-11-12    23.97
2015-11-12    21.88
2015-11-23    23.50
2015-11-23    33.75
2015-11-23    22.70
2015-11-23    33.75
2015-11-24    27.95
2015-11-24    27.95
2015-11-24    27.95
...
2017-03-31    21.93
2017-04-06    22.45
2017-04-06    26.85
2017-04-12    60.40
2017-04-12    37.00
2017-04-12    20.00
2017-04-12    66.00
2017-04-12    60.00
2017-04-13    41.95
2017-04-13    25.97
2017-04-13    29.48
2017-04-19    41.00
2017-04-19    58.00
2017-04-19    78.00
2017-04-19    12.00
2017-04-24    51.05
2017-04-26    21.88
2017-04-26    50.05
2017-04-28    21.00
2017-04-28    30.00

I am having a hard time understanding how to use start and end in the predict function.

According to the docs:

start : int, str, or datetime Zero-indexed observation number at which to start forecasting, ie., the first > forecast is start. Can also be a date string to parse or a datetime type.

end : int, str, or datetime Zero-indexed observation number at which to end forecasting, ie., the first forecast is start. Can also be a date string to parse or a datetime type.

I create a dataframe that has an empty daily time series, add my irregularly spaced time series data to it, and then try to apply the model.

data = pd.read_csv('data.csv', index_col=1, parse_dates=True)
df = pd.DataFrame(index=pd.date_range(start=datetime(2015, 1, 1), end=datetime(2017, 12, 31), freq='d'))
df = df.join(data)
df.cost.interpolate(inplace=True)
ar_model = sm.tsa.AR(df, missing='drop', freq='D')
ar_res = ar_model.fit(maxlag=9, method='mle', disp=-1)
pred = ar_res.predict(start='2016', end='2016')

The predict function results in an error of pandas.tslib.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 605-12-31 00:00:00

If I try to use a more specific date, I get the same type of error:

pred = ar_res.predict(start='2016-01-01', end='2016-06-01')    

If I try to use integers, I get a different error:

pred = ar_res.predict(start=0, end=len(data))
Wrong number of items passed 202, placement implies 197

If I actually use a datetime, I get an error that reads no rule for interpreting end.

I am hitting a wall so hard here I am thinking there must be something I am missing.

Ultimately, I would like to use the model to get out-of-sample predictions (such as a prediction for next quarter).

解决方案

So I was creating a daily index to account for the equally spaced time series requirement, but it still remained non-unique (comment by @user333700).

I added a groupby function to sum duplicate dates together, and could then run the predict function using datetime objects (answer by @andy-hayden).

df = df.groupby(pd.TimeGrouper(freq='D')).sum()
...
ar_res.predict(start=min(df.index), end=datetime(2018,12,31))

With the predict function providing a result, I am now able to analyze the results and tweak the params to get something useful.

这篇关于如何正确设置statsmodels.tsa.ar_model.AR.predict函数的开始/结束参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆