时间序列数据框 python 上的 ARIMA 建模 [英] ARIMA modeling on time-series dataframe python

查看:28
本文介绍了时间序列数据框 python 上的 ARIMA 建模的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 ARIMA 模型进行预测.我是新手.我试图绘制我的数据集(每小时数据)的seasonal_decompose(),下面是图?

I'm trying to use ARIMA model for forecasting. I'm new to it. I have tried to plot seasonal_decompose() of my data-set (hourly data), below is the plot?

我想了解这些情节,简要描述会有所帮助.我看到最初没有趋势,一段时间后有上升趋势.我不确定我说的对不对?我想了解如何正确阅读这些图表.请给出一些好的描述.

I want to understand these plots, brief description will be helpful. I see that there is no trend initially and after some time there is an upward trend. I'm not sure if I'm saying this right? I want to understand how to read these graphs properly. Please give some good description.

当我尝试应用 Dickey-Fuller 检验来检查我的数据是否静止并且我是否需要进一步差分时,我得到了以下结果:

When I'm trying to apply Dickey-Fuller test to check if my data is stationary or not and I need further differencing or not, I got the below results:

Test Statistic                   -4.117543
p-value                           0.000906
Lags Used                       30.000000
Number of Observations Used    4289.000000
Critical Value (1%)              -3.431876
Critical Value (5%)              -2.862214
Critical Value (10%)             -2.567129

我指的是 2 个链接来理解这一点:http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/

I'm referring 2 links to understand this : http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/

这个链接说当test-statistic大于临界值时,表示数据是平稳的;另一方面,另一个链接说反之亦然.我对此感到困惑,我还提到了 otexts.org,它说我们应该根据 p 值进行检查.请建议我如何解释 ADF 测试给出的结果?

this link says when test-statistic is greater than critical value, it means that data is stationary; on the other hand the other link says vice versa. I'm confused on this also I referred otexts.org it says we should check on the basis of p-value. Please suggest how do I interpret results given by ADF test?

此外,当我尝试在数据集上应用 ARIMA 模型时:

Also, when I tried to apply ARIMA model on dataset:

from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(df.y, order=(0,1,0))
model_fit = model.fit()

我的数据框将日期时间列作为索引,而 y 列具有浮点值.当我在这个数据框上应用模型时.我收到此类错误:

My dataframe has datetime column as index and y column has float values. When I'm applying model on this dataframe. I'm getting error of this sort:

IndexError: 列表索引超出范围.

IndexError: list index out of range.

当我尝试使用以下方法打印模型摘要时出现此错误:

This error is coming when I'm trying to print the summary of model using :

print(model_fit.summary())

请帮我解决这个问题.以便我更好地了解ARIMA.

Please help me with this. So that I can get better understanding of ARIMA.

推荐答案

ARIMA(自回归综合移动平均)时间序列的交叉验证: K 折交叉验证不适用于时间序列.相反,使用回测技术,例如 walk-forward滚动窗口.

Cross validation for ARIMA (AutoRegressive Integrated Moving Average) time series: K-fold cross validation does not work for time-series. Instead, use backtesting techniques like walk-forward and rolling windows.

自回归的 K 折交叉验证: 尽管交叉验证(通常)对时间序列 (ARIMA) 模型无效,但 K 折适用于自回归,只要所考虑的模型具有不相关的错误,并且您已经使用 Ljung Box Test 对其进行了测试, 用于时间序列用例中的 XAI(可解释人工智能).

K-fold cross-validation for autoregression: Although cross-validation is (usually) not valid for time series (ARIMA) models, K-fold works for autoregressions as long as the models considered have uncorrelated errors, and you have tested it with the Ljung Box Test, for XAI (Explainable Artificial Intelligence) in time series use cases.

有一些 Python 统计库可以使用这些方法,这里有两个:Python Stats TestsPython StatsModels.

There are a few Python statistics libs that have these methods avail, here are two: Python Stats Tests and Python StatsModels.

要获取值的差异,您可以简单地使用 Python 3.6+ PEP 487 描述符,您可以在其中强制执行始终返回 int8 的类型列表,以实现更快的计算(list : list -> ints 列表):

To get the diff of values, you can simply enforce int8's using Python 3.6+ PEP 487 Descriptors, where you can enforce a type list that always returns int8's, for faster computation as well (list : list -> list of ints):

list_a = [1,2,3]
list_b = [2,3]
print(set(list_a).difference(set(list_b)))
`answer is` set([1])

这篇关于时间序列数据框 python 上的 ARIMA 建模的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆