如何解决 LinAlgError &使用 Python 训练 arima 模型时出现 ValueError [英] how to solve LinAlgError & ValueError when training arima model with Python

查看:73
本文介绍了如何解决 LinAlgError &使用 Python 训练 arima 模型时出现 ValueError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现一个时间序列模型并得到一些奇怪的异常,这些异常对我来说毫无意义.我想知道是我犯了错误还是完全在意料之中.详情来了...

I am trying to implement a time series model and getting some strange exceptions that tells nothing to me. I wonder if I am making a mistake or if it is totally expected. Here comes details...

在训练我的模型时,我尝试进行网格搜索以找到最佳 (p, d, q) 设置.这是完整的代码(我将在下面解释这里发生的事情):

When training my model, I try to make a grid search to find the best (p, d, q) settings. Here is the complete code (and I will explain down below what is happening here):

下面的可重现代码本质上是来自https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/,有一些细微的变化......:

The reproducible code below is essentially a copy from https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/, with some slight changes...:

import warnings
from pandas import Series
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error

# evaluate an ARIMA model for a given order (p,d,q)
def evaluate_arima_model(X, arima_order):
    # prepare training dataset
    train_size = int(len(X) * 0.66)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions = list()
    for t in range(len(test)):
        model = ARIMA(history, order=arima_order)
        model_fit = model.fit(disp=0)
        yhat = model_fit.forecast()[0]
        predictions.append(yhat)
        history.append(test[t])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float64')
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    print("Evaluating the settings: ", p, d, q)
                    mse = evaluate_arima_model(dataset, order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.3f' % (order,mse))
                except Exception as exception:
                    print("Exception occured...", type(exception).__name__, "\n", exception)

    print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))

# dataset
values = np.array([-1.45, -9.04, -3.64, -10.37, -1.36, -6.83, -6.01, -3.84, -9.92, -5.21,
                   -8.97, -6.19, -4.12, -11.03, -2.27, -4.07, -5.08, -4.57, -7.87, -2.80,
                   -4.29, -4.19, -3.76, -22.54, -5.87, -6.39, -4.19, -2.63, -8.70, -3.52, 
                   -5.76, -1.41, -6.94, -12.95, -8.64, -7.21, -4.05, -3.01])

# evaluate parameters
p_values = [7, 8, 9, 10]
d_values = range(0, 3)
q_values = range(0, 3)
warnings.filterwarnings("ignore")
evaluate_models(values, p_values, d_values, q_values)

这是输出(不是所有内容,但提供了足够的信息):

And here is the output (not everything but it gives enough information):

Evaluating the settings:  7 0 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 1
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 0 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 1 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 1 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 0
Exception occured... LinAlgError 
 SVD did not converge
Evaluating the settings:  7 2 1
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.
Evaluating the settings:  7 2 2
Exception occured... ValueError 
 The computed initial AR coefficients are not stationary
You should induce stationarity, choose a different model order, or you can
pass your own start_params.

代码只是尝试所有不同的给定设置,训练模型,计算每个给定设置的 MSE(均方误差),然后选择最佳设置(基于最小 MSE).

The code is simply trying all different given settings, training the model, calculating MSE (mean squared error) for each given setting, and then selecting the best one (based on minimum MSE).

但是在训练过程中,代码不断抛出LinAlgErrorValueError 异常,这对我来说什么都没有.

But during the training procedure, the code keeps throwing LinAlgError and ValueError exceptions, which tells nothing to me.

据我所知,当抛出这些异常时,代码并没有真正训练某些设置,然后只是跳转到将要尝试的下一个设置.

And as far as I can follow it, the code is not really truly training certain settings when these exceptions are thrown, and then just jumping to the next setting that will be tried out.

为什么我会看到这些异常?它们可以被忽略吗?我需要做什么来解决它?

Why do I see these exceptions? Can they be ignored? What do I need to do to solve it out?

推荐答案

首先,回答您的具体问题:我认为SVD 未收敛"是 Statsmodels 的 ARIMA 模型中的一个错误.如今,SARIMAX 模型得到了更好的支持(并且完成了 ARIMA 模型所做的所有事情+更多),因此我建议改用它.为此,请将模型创建替换为:

First, to answer your specific question: I think the "SVD did not converge" is a bug in the ARIMA model of Statsmodels. The SARIMAX model better supported these days (and does everything the ARIMA model does + more), so I would recommend using that instead. To do so, replace model creation with:

model = sm.tsa.SARIMAX(history, trend='c', order=arima_order, enforce_stationarity=False, enforce_invertibility=False)

话虽如此,我认为鉴于您的时间序列和您正在尝试的规范,您仍然不太可能获得好的结果.

With that being said, I think that you are still unlikely to get good results given your time series and the specifications you are trying.

特别是,您的时间序列非常短,您只考虑极长的自回归滞后长度 (p > 6).很难用如此少的数据点估计许多参数,尤其是当您还有积分(d = 1 或 d = 2)并且还添加了移动平均分量时.我建议您重新评估您正在考虑的模型.

In particular, your time series is very short, and you are only considering extremely long autoregressive lag lengths (p > 6). It will be difficult to estimate that many parameters with so few data points, particularly when you also have integration (d = 1 or d = 2) and when you also add in moving average components. I suggest that you re-evaluate which models you are considering.

这篇关于如何解决 LinAlgError &amp;使用 Python 训练 arima 模型时出现 ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆