在时间序列数据框python上进行ARIMA建模 [英] ARIMA modeling on time-series dataframe python

查看:73
本文介绍了在时间序列数据框python上进行ARIMA建模的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用ARIMA模型进行预测.我是新来的.我试图绘制我的数据集(每小时数据)的seasonal_decompose(),下面是该图吗?

I'm trying to use ARIMA model for forecasting. I'm new to it. I have tried to plot seasonal_decompose() of my data-set (hourly data), below is the plot?

我想了解这些情节,简要说明会有所帮助.我看到最初没有趋势,一段时间后又有上升趋势.我不确定我要说的对吗?我想了解如何正确阅读这些图表.请提供一些好的描述.

I want to understand these plots, brief description will be helpful. I see that there is no trend initially and after some time there is an upward trend. I'm not sure if I'm saying this right? I want to understand how to read these graphs properly. Please give some good description.

当我尝试应用Dickey-Fuller测试来检查我的数据是否稳定并且需要进一步区分时,我得到了以下结果:

When I'm trying to apply Dickey-Fuller test to check if my data is stationary or not and I need further differencing or not, I got the below results:

Test Statistic                   -4.117543
p-value                           0.000906
Lags Used                       30.000000
Number of Observations Used    4289.000000
Critical Value (1%)              -3.431876
Critical Value (5%)              -2.862214
Critical Value (10%)             -2.567129

我指的是2个链接以了解这一点: http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/

I'm referring 2 links to understand this : http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/

此链接表示,当test-statistic大于临界值时,表示数据是固定的;另一方面,反之亦然.我对此感到困惑,我也提到了otexts.org,它说我们应该基于p值进行检查.请提出如何解释ADF测试给出的结果?

this link says when test-statistic is greater than critical value, it means that data is stationary; on the other hand the other link says vice versa. I'm confused on this also I referred otexts.org it says we should check on the basis of p-value. Please suggest how do I interpret results given by ADF test?

此外,当我尝试将ARIMA模型应用于数据集时:

Also, when I tried to apply ARIMA model on dataset:

from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(df.y, order=(0,1,0))
model_fit = model.fit()

我的数据框具有datetime列作为索引,而y列具有浮点值.当我在此数据框上应用模型时.我遇到这种错误:

My dataframe has datetime column as index and y column has float values. When I'm applying model on this dataframe. I'm getting error of this sort:

IndexError:列表索引超出范围.

IndexError: list index out of range.

当我尝试使用以下命令打印模型摘要时,就会出现此错误:

This error is coming when I'm trying to print the summary of model using :

print(model_fit.summary())

请帮助我.这样我就可以更好地了解ARIMA.

Please help me with this. So that I can get better understanding of ARIMA.

推荐答案

ARIMA(自回归综合移动平均线)时间序列的交叉验证:K倍交叉验证不适用于该时间序列.相反,请使用 walk-forward之类的回测技术滚动窗口.

Cross validation for ARIMA (AutoRegressive Integrated Moving Average) time series: K-fold cross validation does not work for time-series. Instead, use backtesting techniques like walk-forward and rolling windows.

自回归的K折交叉验证:尽管交叉验证(通常)对于时间序列(ARIMA)模型无效,但只要考虑的模型具有自回归,K折对自动回归有效不相关的错误,并且您已经使用 Ljung Box Test 对其进行了测试.,用于时间序列用例中的XAI(可解释人工智能).

K-fold cross-validation for autoregression: Although cross-validation is (usually) not valid for time series (ARIMA) models, K-fold works for autoregressions as long as the models considered have uncorrelated errors, and you have tested it with the Ljung Box Test, for XAI (Explainable Artificial Intelligence) in time series use cases.

有一些使用这些方法的Python统计资料库,有两个: Python统计资料测试 Python StatsModels .

There are a few Python statistics libs that have these methods avail, here are two: Python Stats Tests and Python StatsModels.

要获取值的差异,您只需使用 Python 3.6+ PEP 487描述符,您可以在其中强制执行始终返回int8的类型列表,以便更快地进行计算以及(list:list-> int列表):

To get the diff of values, you can simply enforce int8's using Python 3.6+ PEP 487 Descriptors, where you can enforce a type list that always returns int8's, for faster computation as well (list : list -> list of ints):

list_a = [1,2,3]
list_b = [2,3]
print(set(list_a).difference(set(list_b)))
`answer is` set([1])

这篇关于在时间序列数据框python上进行ARIMA建模的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆