Python statsmodels 强大的 cov_type='hac-panel' 问题 [英] Python statsmodels robust cov_type='hac-panel' issue

查看:111
本文介绍了Python statsmodels 强大的 cov_type='hac-panel' 问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望这是我提出问题的正确地方.

I hope this is the right place for my question.

我想了解在运行 sm.OLS 时如何使用 'hac-panel' cov_type.我一整天都在与它斗争,但仍然无法弄清楚.这是我的代码示例(带数据):

I would like to understand how to use the 'hac-panel' cov_type when running sm.OLS. I have struggled with it the whole day but still cannot figure it out. Here is an example of my code (with data):

import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pandas.tseries.offsets import *

# Just grabbing some random data here
dat = sm.datasets.macrodata.load_pandas().data
dat['time'] = dat['year'].apply(lambda x: pd.to_datetime(x, format='%Y'))
dat['time'] = dat.apply(lambda x:(x['time'] + BQuarterBegin(x['quarter'])), axis=1)
dat = dat.set_index('time')
dat = dat.sort_index()
dat['dGDP'] = (dat['realgdp'] - dat['realgdp'].shift(1))/dat['realgdp'].shift(1) * 100.0
dat['dM1'] = (dat['m1'] - dat['m1'].shift(1))/dat['m1'].shift(1) * 100.0
dat['dUEMP'] = dat['unemp'] - dat['unemp'].shift(1)
dat['dCPI'] = dat['infl'] - dat['infl'].shift(1)
dat = dat[['dGDP', 'dM1', 'dUEMP', 'dCPI']]

# Fitting the model
y_var = dat.unstack()
x_var = pd.DataFrame(dat.shift(1).unstack(), columns=['01m']).combine_first(pd.DataFrame(dat.shift(3).unstack(), columns=['03m'])).combine_first(pd.DataFrame(dat.shift(12).unstack(), columns=['12m']))

model = sm.OLS(y_var, sm.add_constant(x_var), missing='drop')

这有效 - 据我了解它执行 HAC cov 的文档.但是,我不确定我是否正确调用它

This works - which as far as I understand the docs it enforcing HAC cov. However, I am not sure if I am calling it correctly

res = model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'maxlags': 11})
res.summary()

这里是我有问题的地方.假设我也想按时间聚类,我认为应该是这样的:

Here is where I have a problem. Let's say I want to also cluster by time, which I think should be something like this:

model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11})

非常感谢所有帮助.非常感谢您提前.即使给我指出一个例子也会很棒 - 找不到任何东西.

All help is really appreciated. Thank you very much in advance. Even pointing me to an example would be great - couldn't find anything.

我收到此错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-74b3e662267b> in <module>
----> 1 model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11})

~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in fit(self, method, cov_type, cov_kwds, use_t, **kwargs)
    343                 self, beta,
    344                 normalized_cov_params=self.normalized_cov_params,
--> 345                 cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t)
    346         else:
    347             lfit = RegressionResults(

~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in __init__(self, model, params, normalized_cov_params, scale, cov_type, cov_kwds, use_t, **kwargs)
   1555                 # TODO: warn or not?
   1556             self.get_robustcov_results(cov_type=cov_type, use_self=True,
-> 1557                                        use_t=use_t, **cov_kwds)
   1558         for key in kwargs:
   1559             setattr(self, key, kwargs[key])

~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in get_robustcov_results(self, cov_type, use_t, **kwargs)
   2490             res.cov_params_default = sw.cov_nw_panel(self, maxlags, groupidx,
   2491                                                      weights_func=weights_func,
-> 2492                                                      use_correction=use_correction)
   2493             res.cov_kwds['description'] = descriptions['HAC-Panel']
   2494 

~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in cov_nw_panel(results, nlags, groupidx, weights_func, use_correction)
    785     xu, hessian_inv = _get_sandwich_arrays(results)
    786 
--> 787     S_hac = S_nw_panel(xu, weights, groupidx)
    788     cov_hac = _HCCM2(hessian_inv, S_hac)
    789     if use_correction:

~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in S_nw_panel(xw, weights, groupidx)
    723     S = weights[0] * np.dot(xw.T, xw)  #weights just for completeness
    724     for lag in range(1, nlags+1):
--> 725         xw0, xwlag = lagged_groups(xw, lag, groupidx)
    726         s = np.dot(xw0.T, xwlag)
    727         S += weights[lag] * (s + s.T)

~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in lagged_groups(x, lag, groupidx)
    706 
    707     if out0 == []:
--> 708         raise ValueError('all groups are empty taking lags')
    709     #return out0, out_lagged
    710     return np.vstack(out0), np.vstack(out_lagged)

ValueError: all groups are empty taking lags

推荐答案

正在寻找一个示例,您的帮助非常大.

was looking for an example and your was very helpful.

您的代码的唯一问题似乎是在

Only problem with your code seems to be using the same time index in

cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11} 

基本上,它会将 dat.index 中的每个唯一单位作为一个单独的组进行计算,在您的情况下是每个季度.同时,它使用那一年作为时间指标,因此您的组存在同一季度的所有观察结果,时间长度为四分之一.由于您的时间变量只是每个组的一个周期,因此不涉及滞后,因此误差.

Basically, it accounts every unique unit from dat.index as a separate group, in your case every quarter. At the same time it uses that year as a time indicator so your group exists of all observations in the same quarter with time lenght one quarter. Since your time variable is just one period for each group there are no lags involved thus the error.

这篇关于Python statsmodels 强大的 cov_type='hac-panel' 问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆