如何在python statsmodels ARIMA预测中求差? [英] How to invert differencing in a Python statsmodels ARIMA forecast?

查看:236
本文介绍了如何在python statsmodels ARIMA预测中求差?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python和Statsmodels来进行ARIMA预测.具体来说,为了使ARIMA算法正常工作,需要通过差分(或类似方法)使数据固定.问题是:在做出残差预测后,如何反转差异以返回到包括已被差异化的趋势和季节性在内的预测?

I'm trying to wrap my head around ARIMA forecasting using Python and Statsmodels. Specifically, for the ARIMA algorithm to work, the data needs to be made stationary via differencing (or similar method). The question is: How does one invert the differencing after the residual forecast has been made to get back to a forecast including the trend and seasonality that was differenced out?

(我在此处看到了类似的问题,但是,还没有任何答案.)

(I saw a similar question here but alas, no answers have been posted.)

这是我到目前为止所做的(基于掌握Python数据分析的最后一章中的示例,Magnus Vilhelm Persson; Luiz Felipe Martins).数据来自 DataMarket .

Here's what I've done so far (based on the example in the last chapter of Mastering Python Data Analysis, Magnus Vilhelm Persson; Luiz Felipe Martins). The data comes from DataMarket.

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels import tsa 
from statsmodels.tsa import stattools as stt 
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA 

def is_stationary(df, maxlag=15, autolag=None, regression='ct'): 
    """Test if df is stationary using Augmented 
    Dickey Fuller""" 

    adf_test = stt.adfuller(df,maxlag=maxlag, autolag=autolag, regression=regression) 
    adf = adf_test[0]
    cv_5 = adf_test[4]["5%"]

    result = adf < cv_5    
    return result

def d_param(df, max_lag=12):
    d = 0
    for i in range(1, max_lag):
        if is_stationary(df.diff(i).dropna()):
            d = i
            break;
    return d

def ARMA_params(df):
    p, q = tsa.stattools.arma_order_select_ic(df.dropna(),ic='aic').aic_min_order
    return p, q

# read data
carsales = pd.read_csv('data/monthly-car-sales-in-quebec-1960.csv', 
                   parse_dates=['Month'],  
                   index_col='Month',  
                   date_parser=lambda d:pd.datetime.strptime(d, '%Y-%m'))
carsales = carsales.iloc[:,0] 

# get components
carsales_decomp = seasonal_decompose(carsales, freq=12)
residuals = carsales - carsales_decomp.seasonal - carsales_decomp.trend 
residuals = residuals.dropna()

# fit model
d = d_param(carsales, max_lag=12)
p, q = ARMA_params(residuals)
model = ARIMA(residuals, order=(p, d, q)) 
model_fit = model.fit() 

# plot prediction
model_fit.plot_predict(start='1961-12-01', end='1970-01-01', alpha=0.10) 
plt.legend(loc='upper left') 
plt.xlabel('Year') 
plt.ylabel('Sales')
plt.title('Residuals 1960-1970')
print(arimares.aic, arimares.bic)  

结果图令人满意,但不包括趋势和季节性信息.如何反转差异以重新捕获趋势/季节性? 剩余图

The resulting plot is satisfying, but doesn't include the trend, seasonality info. How do I invert the differencing to recapture the trend/seasonality? Residual plot

推荐答案

当时间趋势(或多个时间趋势)可能是更好的策略时,依靠差分.时期33是一个离群值,如果您忽略它,则会产生后果.

Relying on differencing when a time trend (or multiple) may be a better strategy. Period 33 is an outlier and if you ignore it then it has consequences.

PACF没有明显的季节性成分.

The PACF doesn't show a strong seasonal component.

3月,4月,5月和6月这是一个较弱的季节性AR,具有很强的相关性.

It is a weak seasonal AR with March, April, May and June with strong correlation.

这篇关于如何在python statsmodels ARIMA预测中求差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆