Python ARIMA外生变量样本不足 [英] Python ARIMA exogenous variable out of sample

查看：442 发布时间：2020/5/18 21:00:37 python numpy statsmodels predict

本文介绍了Python ARIMA外生变量样本不足的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在python statsmodels ARIMA包中预测一个包含一个外生变量的时间序列，但是无法找出在预测步骤中插入该外生变量的正确方法.有关文档，请参见此处.

I am trying to predict a time series in python statsmodels ARIMA package with the inclusion of an exogenous variable, but cannot figure out the correct way to insert the exogenous variable in the predict step. See here for docs.

import numpy as np
from scipy import stats
import pandas as pd

import statsmodels.api as sm

vals = np.random.rand(13)
ts = pd.TimeSeries(vals)
df = pd.DataFrame(ts, columns=["test"])
df.index = pd.Index(pd.date_range("2011/01/01", periods = len(vals), freq = 'Q'))

fit1 = sm.tsa.ARIMA(df, (1,0,0)).fit()
#this works fine:
pred1 = fit1.predict(start=12, end = 16)
print(pred1)

Out[32]: 
2014-03-31    0.589121
2014-06-30    0.747575
2014-09-30    0.631322
2014-12-31    0.654858
2015-03-31    0.650093
Freq: Q-DEC, dtype: float64

现在添加趋势外生变量

exogx = np.array(range(1,14))
#to make this easy, let's look at the ols of the trend (arima(0,0,0))
fit2 = sm.tsa.ARIMA(df, (0,0,0),exog = exogx).fit()
print(fit2.params)

const    0.555226
x1       0.013132
dtype: float64

print(fit2.fittedvalues)

2011-03-31    0.568358
2011-06-30    0.581490
2011-09-30    0.594622
2011-12-31    0.607754
2012-03-31    0.620886
2012-06-30    0.634018
2012-09-30    0.647150
2012-12-31    0.660282
2013-03-31    0.673414
2013-06-30    0.686546
2013-09-30    0.699678
2013-12-31    0.712810
2014-03-31    0.725942
Freq: Q-DEC, dtype: float64

请注意，正如我们期望的那样，这是一条趋势线，随着时间的每一次变动而增加0.013132(当然，这是随机数据，因此，如果运行它，值将有所不同，但趋势为正或负将相同).因此，下一个值(对于时间= 14)应为0.555226 + 0.013132 * 14 = 0.739074.

Notice, as we would expect, this is a trend line, increasing 0.013132 with each increase tick in time (of course this is random data, so if you run it the values will be different, but the positive or negative trend story will be the same). So, the next value (for time = 14) should be 0.555226 + 0.013132*14 = 0.739074.

#out of sample exog should be (14,15,16)
pred2 = fit2.predict(start = 12, end = 16, exog = np.array(range(13,17)))
print(pred2)
2014-03-31    0.725942
2014-06-30    0.568358
2014-09-30    0.581490
2014-12-31    0.594622
2015-03-31    0.765338
Freq: Q-DEC, dtype: float64

因此，2014-03-31正确预测了(最后一个样本)，但是2014-06-30从头开始(t = 1)，但是要注意2015-03-31(实际上，始终是对样本的最后观察)不管水平如何，预测都会使t = 16(即(值-截距)/beta =(0.765338-0.555226)/0.013132).

So, 2014-03-31 predicts (the last insample) correctly, but 2014-06-30 starts back at the beginning (t = 1), but notice 2015-03-31 (actually, always the last observation of the forecast, regardless of horizon) picks up t = 16 (that is, (value - intercept)/beta = (0.765338 - 0.555226)/0.013132).

为使这一点更加清楚，请注意当我增加x垫的值时会发生什么情况

To make this more clear, notice what happens when I inflate the values of of the x mat

fit2.predict(start = 12, end = 16, exog = np.array(range(13,17))*10000)
Out[41]: 
2014-03-31       0.725942
2014-06-30       0.568358
2014-09-30       0.581490
2014-12-31       0.594622
2015-03-31    2101.680532
Freq: Q-DEC, dtype: float64

看到2015-03-31爆炸了，但是没有考虑其他xmat值吗?我在这里做错了什么??

See that 2015-03-31 exploded, but none of the other xmat values were considered? What am I doing wrong here???

我尝试过各种方法，我知道如何传递exog变量(更改尺寸，将exog制成矩阵，只要输入加上地平线就将exog制成，等等，等等，等等).任何建议将不胜感激.

I have tried playing around with every way that I know how for passing in the exog variable (changing dimension, making the exog a matrix, making the exog as long as input plus the horizon, etc, etc, etc). Any suggestions would be really appreciated.

我正在使用Anaconda2.1中的2.7 numpy的1.8.1 scipy 0.14.0 熊猫0.14.0 统计模型0.5.0

I am using 2.7 from Anaconda2.1 numpy 1.8.1 scipy 0.14.0 pandas 0.14.0 statsmodels 0.5.0

，并已在Windows 7 64位和centos 64位上验证了该问题.

and have verified the issue on windows 7 64 bit, and centos 64 bit.

还有一些事情.我将ARIMA用于ARIMA功能，并且以上内容仅用于说明(也就是说，我不能仅使用OLS ..."，正如我所建议的那样).由于项目的限制，我也不能仅使用R"(更普遍的是，基本Spark中缺乏R的支持).

Also, a few things. I am using ARIMA for the ARIMA functionality and the above is just for illustration (that is, I cannot "just use OLS...", as I imagine will be suggested). I also cannot "just use R" due to the restrictions of the project (and more generally, the lack of support of R in base Spark).

这是代码中有趣的部分，以防您自己尝试

Here are the interesting parts of the code all together in case you want to try it yourself

import numpy as np
from scipy import stats
import pandas as pd
import statsmodels.api as sm

vals = np.random.rand(13)
ts = pd.TimeSeries(vals)
df = pd.DataFrame(ts, columns=["test"])
df.index = pd.Index(pd.date_range("2011/01/01", periods = len(vals), freq = 'Q'))

exogx = np.array(range(1,14))
fit2 = sm.tsa.ARIMA(df, (0,0,0),exog = exogx).fit()
print(fit2.fittedvalues)
pred2 = fit2.predict(start = 12, end = 16, exog = np.array(range(13,17))*10000)
print(pred2)

Python ARIMA外生变量样本不足 [英] Python ARIMA exogenous variable out of sample

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python ARIMA外生变量样本不足 [英] Python ARIMA exogenous variable out of sample

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭