如何解决LinAlgError&使用Python训练Arima模型时出现ValueError [英] how to solve LinAlgError & ValueError when training arima model with Python

查看:84
本文介绍了如何解决LinAlgError&使用Python训练Arima模型时出现ValueError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现时间序列模型,并得到一些奇怪的异常,这些异常对我没有任何帮助.我想知道我是在犯错误还是完全可以预期.这是细节...

I am trying to implement a time series model and getting some strange exceptions that tells nothing to me. I wonder if I am making a mistake or if it is totally expected. Here comes details...

在训练模型时,我尝试进行网格搜索以找到最佳(p,d,q)设置.这是完整的代码(我将在这里进行解释):

When training my model, I try to make a grid search to find the best (p, d, q) settings. Here is the complete code (and I will explain down below what is happening here):

下面的可重现代码本质上是

The reproducible code below is essentially a copy from https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/, with some slight changes...:

import warnings
from pandas import Series
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error

# evaluate an ARIMA model for a given order (p,d,q)
def evaluate_arima_model(X, arima_order):
    # prepare training dataset
    train_size = int(len(X) * 0.66)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions = list()
    for t in range(len(test)):
        model = ARIMA(history, order=arima_order)
        model_fit = model.fit(disp=0)
        yhat = model_fit.forecast()[0]
        predictions.append(yhat)
        history.append(test[t])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float64')
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    print("Evaluating the settings: ", p, d, q)
                    mse = evaluate_arima_model(dataset, order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.3f' % (order,mse))
                except Exception as exception:
                    print("Exception occured...", type(exception).__name__, "\n", exception)

    print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))

# dataset
values = np.array([-1.45, -9.04, -3.64, -10.37, -1.36, -6.83, -6.01, -3.84, -9.92, -5.21,
                   -8.97, -6.19, -4.12, -11.03, -2.27, -4.07, -5.08, -4.57, -7.87, -2.80,
                   -4.29, -4.19, -3.76, -22.54, -5.87, -6.39, -4.19, -2.63, -8.70, -3.52, 
                   -5.76, -1.41, -6.94, -12.95, -8.64, -7.21, -4.05, -3.01])

# evaluate parameters
p_values = [7, 8, 9, 10]
d_values = range(0, 3)
q_values = range(0, 3)
warnings.filterwarnings("ignore")
evaluate_models(values, p_values, d_values, q_values)

这是输出(不是所有内容,但它提供了足够的信息):

And here is the output (not everything but it gives enough information):

Evaluating the settings:  7 0 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 1
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 1 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 2 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.

代码只是简单地尝试所有不同的给定设置,训练模型,为每个给定设置计算MSE(均方误差),然后选择最佳设置(基于最小MSE).

The code is simply trying all different given settings, training the model, calculating MSE (mean squared error) for each given setting, and then selecting the best one (based on minimum MSE).

但是在培训过程中,代码不断抛出 LinAlgError ValueError 异常,这对我没有任何帮助.

But during the training procedure, the code keeps throwing LinAlgError and ValueError exceptions, which tells nothing to me.

据我所知,当抛出这些异常时,代码并没有真正真正地训练某些设置,然后只是跳转到将要尝试的下一个设置.

And as far as I can follow it, the code is not really truly training certain settings when these exceptions are thrown, and then just jumping to the next setting that will be tried out.

为什么我会看到这些例外?他们可以被忽略吗?我需要怎么做才能解决?

Why do I see these exceptions? Can they be ignored? What do I need to do to solve it out?

推荐答案

首先,回答您的特定问题:我认为"SVD未收敛"是Statsmodels的ARIMA模型中的错误.如今,SARIMAX模型得到了更好的支持(并且ARIMA模型所做的一切以及更多功能都得到了支持),所以我建议改用它.为此,将模型创建替换为:

First, to answer your specific question: I think the "SVD did not converge" is a bug in the ARIMA model of Statsmodels. The SARIMAX model better supported these days (and does everything the ARIMA model does + more), so I would recommend using that instead. To do so, replace model creation with:

model = sm.tsa.SARIMAX(history, trend='c', order=arima_order, enforce_stationarity=False, enforce_invertibility=False)

话虽如此,我认为鉴于时间序列和所尝试的规格,您仍然不太可能获得良好的结果.

With that being said, I think that you are still unlikely to get good results given your time series and the specifications you are trying.

特别是,您的时间序列非常短,并且您仅考虑极长的自回归滞后长度(p> 6).很难估计许多参数具有很少的数据点,尤其是当您还具有积分(d = 1或d = 2)并且还添加移动平均成分时.我建议您重新评估您正在考虑的模型.

In particular, your time series is very short, and you are only considering extremely long autoregressive lag lengths (p > 6). It will be difficult to estimate that many parameters with so few data points, particularly when you also have integration (d = 1 or d = 2) and when you also add in moving average components. I suggest that you re-evaluate which models you are considering.

这篇关于如何解决LinAlgError&amp;使用Python训练Arima模型时出现ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆