Python statsmodels 强大的 cov_type='hac-panel' 问题 [英] Python statsmodels robust cov_type='hac-panel' issue
问题描述
我希望这是我提出问题的正确地方.
I hope this is the right place for my question.
我想了解在运行 sm.OLS 时如何使用 'hac-panel' cov_type.我一整天都在与它斗争,但仍然无法弄清楚.这是我的代码示例(带数据):
I would like to understand how to use the 'hac-panel' cov_type when running sm.OLS. I have struggled with it the whole day but still cannot figure it out. Here is an example of my code (with data):
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pandas.tseries.offsets import *
# Just grabbing some random data here
dat = sm.datasets.macrodata.load_pandas().data
dat['time'] = dat['year'].apply(lambda x: pd.to_datetime(x, format='%Y'))
dat['time'] = dat.apply(lambda x:(x['time'] + BQuarterBegin(x['quarter'])), axis=1)
dat = dat.set_index('time')
dat = dat.sort_index()
dat['dGDP'] = (dat['realgdp'] - dat['realgdp'].shift(1))/dat['realgdp'].shift(1) * 100.0
dat['dM1'] = (dat['m1'] - dat['m1'].shift(1))/dat['m1'].shift(1) * 100.0
dat['dUEMP'] = dat['unemp'] - dat['unemp'].shift(1)
dat['dCPI'] = dat['infl'] - dat['infl'].shift(1)
dat = dat[['dGDP', 'dM1', 'dUEMP', 'dCPI']]
# Fitting the model
y_var = dat.unstack()
x_var = pd.DataFrame(dat.shift(1).unstack(), columns=['01m']).combine_first(pd.DataFrame(dat.shift(3).unstack(), columns=['03m'])).combine_first(pd.DataFrame(dat.shift(12).unstack(), columns=['12m']))
model = sm.OLS(y_var, sm.add_constant(x_var), missing='drop')
这有效 - 据我了解它执行 HAC cov 的文档.但是,我不确定我是否正确调用它
This works - which as far as I understand the docs it enforcing HAC cov. However, I am not sure if I am calling it correctly
res = model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'maxlags': 11})
res.summary()
这里是我有问题的地方.假设我也想按时间聚类,我认为应该是这样的:
Here is where I have a problem. Let's say I want to also cluster by time, which I think should be something like this:
model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11})
非常感谢所有帮助.非常感谢您提前.即使给我指出一个例子也会很棒 - 找不到任何东西.
All help is really appreciated. Thank you very much in advance. Even pointing me to an example would be great - couldn't find anything.
我收到此错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-74b3e662267b> in <module>
----> 1 model.fit(cov_type='hac-panel', cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11})
~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in fit(self, method, cov_type, cov_kwds, use_t, **kwargs)
343 self, beta,
344 normalized_cov_params=self.normalized_cov_params,
--> 345 cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t)
346 else:
347 lfit = RegressionResults(
~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in __init__(self, model, params, normalized_cov_params, scale, cov_type, cov_kwds, use_t, **kwargs)
1555 # TODO: warn or not?
1556 self.get_robustcov_results(cov_type=cov_type, use_self=True,
-> 1557 use_t=use_t, **cov_kwds)
1558 for key in kwargs:
1559 setattr(self, key, kwargs[key])
~\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py in get_robustcov_results(self, cov_type, use_t, **kwargs)
2490 res.cov_params_default = sw.cov_nw_panel(self, maxlags, groupidx,
2491 weights_func=weights_func,
-> 2492 use_correction=use_correction)
2493 res.cov_kwds['description'] = descriptions['HAC-Panel']
2494
~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in cov_nw_panel(results, nlags, groupidx, weights_func, use_correction)
785 xu, hessian_inv = _get_sandwich_arrays(results)
786
--> 787 S_hac = S_nw_panel(xu, weights, groupidx)
788 cov_hac = _HCCM2(hessian_inv, S_hac)
789 if use_correction:
~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in S_nw_panel(xw, weights, groupidx)
723 S = weights[0] * np.dot(xw.T, xw) #weights just for completeness
724 for lag in range(1, nlags+1):
--> 725 xw0, xwlag = lagged_groups(xw, lag, groupidx)
726 s = np.dot(xw0.T, xwlag)
727 S += weights[lag] * (s + s.T)
~\anaconda3\lib\site-packages\statsmodels\stats\sandwich_covariance.py in lagged_groups(x, lag, groupidx)
706
707 if out0 == []:
--> 708 raise ValueError('all groups are empty taking lags')
709 #return out0, out_lagged
710 return np.vstack(out0), np.vstack(out_lagged)
ValueError: all groups are empty taking lags
推荐答案
正在寻找一个示例,您的帮助非常大.
was looking for an example and your was very helpful.
您的代码的唯一问题似乎是在
Only problem with your code seems to be using the same time index in
cov_kwds={'time': dat.index, 'groups': dat.index, 'maxlags': 11}
基本上,它会将 dat.index 中的每个唯一单位作为一个单独的组进行计算,在您的情况下是每个季度.同时,它使用那一年作为时间指标,因此您的组存在同一季度的所有观察结果,时间长度为四分之一.由于您的时间变量只是每个组的一个周期,因此不涉及滞后,因此误差.
Basically, it accounts every unique unit from dat.index as a separate group, in your case every quarter. At the same time it uses that year as a time indicator so your group exists of all observations in the same quarter with time lenght one quarter. Since your time variable is just one period for each group there are no lags involved thus the error.
这篇关于Python statsmodels 强大的 cov_type='hac-panel' 问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!