带有Python + pandas + statsmodels的VAR模型 [英] VAR model with pandas + statsmodels in Python

查看:1925
本文介绍了带有Python + pandas + statsmodels的VAR模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的狂热用户,但由于一些不同的原因,最近切换到了Python.但是,我在从statsmodels到Python中运行向量AR模型方面有些挣扎.

I am an avid user of R, but recently switched to Python for a few different reasons. However, I am struggling a little to run the vector AR model in Python from statsmodels.

Q#1.运行此命令时出现错误,我怀疑它与向量的类型有关.

Q#1. I get an error when I run this, and I have a suspicion it has something to do with the type of my vector.

    import numpy as np
    import statsmodels.tsa.api
    from statsmodels import datasets
    import datetime as dt
    import pandas as pd
    from pandas import Series
    from pandas import DataFrame
    import os

    df = pd.read_csv('myfile.csv')
    speedonly = DataFrame(df['speed'])
    results = statsmodels.tsa.api.VAR(speedonly)

    Traceback (most recent call last):
    File "<pyshell#14>", line 1, in <module>
      results = statsmodels.tsa.api.VAR(speedonly)
    File "C:\Python27\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py", line 336, in __init__
      super(VAR, self).__init__(endog, None, dates, freq)
    File "C:\Python27\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 40, in __init__
      self._init_dates(dates, freq)
    File "C:\Python27\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 54, in _init_dates
      raise ValueError("dates must be of type datetime")
    ValueError: dates must be of type datetime

现在,有趣的是,当我从此处运行VAR示例时

Now, interestingly, when I run the VAR example from here https://github.com/statsmodels/statsmodels/blob/master/docs/source/vector_ar.rst#id5, it works fine.

我尝试使用Wes McKinney的"Python for Data Analysis"(第293页)中的第三个较短的向量ts的VAR模型,但它不起作用.

I try the VAR model with a third, shorter vector, ts, from Wes McKinney's "Python for Data Analysis," page 293 and it doesn't work.

好的,现在我想是因为向量是不同的类型

Okay, so now I'm thinking it's because the vectors are different types:

    >>> speedonly.head()
         speed
    0  559.984
    1  559.984
    2  559.984
    3  559.984
    4  559.984
    >>> type(speedonly)
    <class 'pandas.core.frame.DataFrame'> #DOESN'T WORK

    >>> type(data)
    <type 'numpy.ndarray'> #WORKS

    >>> ts
    2011-01-02   -0.682317
    2011-01-05    1.121983
    2011-01-07    0.507047
    2011-01-08   -0.038240
    2011-01-10   -0.890730
    2011-01-12   -0.388685
    >>> type(ts)
    <class 'pandas.core.series.TimeSeries'> #DOESN'T WORK

所以我将speedonly转换为ndarray ...,但仍然无法正常工作.但是这次我又遇到了另一个错误:

So I convert speedonly to an ndarray... and it still doesn't work. But this time I get another error:

   >>> nda_speedonly = np.array(speedonly)
   >>> results = statsmodels.tsa.api.VAR(nda_speedonly)

   Traceback (most recent call last):
   File "<pyshell#47>", line 1, in <module>
     results = statsmodels.tsa.api.VAR(nda_speedonly)
   File "C:\Python27\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py", line 345, in __init__
     self.neqs = self.endog.shape[1]
   IndexError: tuple index out of range

有什么建议吗?

Q#2.我的数据集中有外生特征变量,这些变量对于预测很有用. statsmodels中的上述模型甚至是最好的模型吗?

Q#2. I have exogenous feature variables in my data set that appear to be useful for predictions. Is the above model from statsmodels even the best one to use?

推荐答案

将熊猫对象提供给时间序列模型时,它期望索引为日期.当前消息源中的错误消息已得到改进(即将发布).

When you give a pandas object to a time-series model, it expects that the index is dates. The error message is improved in the current source (to be released soon).

ValueError: Given a pandas object and the index does not contain dates

在第二种情况下,您将给VAR一个单一的1d系列.当您有多个序列时,将使用VAR.这就是为什么出现形状错误的原因,因为它期望数组中存在第二维.我们可以在这里改善错误消息.对于具有外生变量的单个系列AR模型,您可能要使用sm.tsa.ARMA.请注意,ARMA.predict中有一个已知错误,其外来变量的模型即将修复.如果您可以为此提供一个测试用例,那么会有所帮助.

In the second case, you're giving a single 1d series to a VAR. VARs are used when you have more than one series. That's why you have the shape error because it expects there to be a second dimension in your array. We could probably improve the error message here. For a single series AR model with exogenous variables, you probably want to use sm.tsa.ARMA. Note that there is a known bug in ARMA.predict for models with exogenous variables to fixed soon. If you could provide a test case for this it would be helpful.

这篇关于带有Python + pandas + statsmodels的VAR模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆