在Pandas或Statsmodels中的固定效果 [英] Fixed effect in Pandas or Statsmodels
问题描述
是否存在从熊猫或Stats模型估算固定效果(单向或双向)的功能.
Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels.
Statsmodels中曾经有一个函数,但似乎已停产.在Pandas中,有一个叫做plm
的东西,但是我不能导入它或使用pd.plm()
运行它.
There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is something called plm
, but I can't import it or run it using pd.plm()
.
推荐答案
如注释中所述,从0.20.0版本开始,PanelOLS已从Pandas中删除.因此,您确实有三个选择:
As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options:
-
如果您使用Python 3,则可以按照最新答案中的说明使用
linearmodels
: https://stackoverflow .com/a/44836199/3435183
只需在statsmodels
规范中指定各种虚拟对象,例如使用 pd.get_dummies
.如果固定效果的数量很多,可能不可行.
Just specify various dummies in your statsmodels
specification, e.g. using pd.get_dummies
. May not be feasible if the number of fixed effects is large.
或执行一些基于分组的贬义,然后使用statsmodels
(如果您要估计很多固定效果,这将起作用).这是您可以通过一种固定方式完成的工作的准系统版本:
Or do some groupby based demeaning and then use statsmodels
(this would work if you're estimating lots of fixed effects). Here is a barebones version of what you could do for one way fixed effects:
def areg(formula,data=None,absorb=None,cluster=None):
y,X = patsy.dmatrices(formula,data,return_type='dataframe')
ybar = y.mean()
y = y - y.groupby(data[absorb]).transform('mean') + ybar
Xbar = X.mean()
X = X - X.groupby(data[absorb]).transform('mean') + Xbar
reg = sm.OLS(y,X)
# Account for df loss from FE transform
reg.df_resid -= (data[absorb].nunique() - 1)
return reg.fit(cov_type='cluster',cov_kwds={'groups':data[cluster].values})
这是使用旧版本的Pandas
时可以执行的操作:
And here is what you can do if using an older version of Pandas
:
一个使用pandas的PanelOLS
(位于plm模块中)的具有时间固定效果的示例.注意,PanelOLS
的导入:
An example with time fixed effects using pandas' PanelOLS
(which is in the plm module). Notice, the import of PanelOLS
:
>>> from pandas.stats.plm import PanelOLS
>>> df
y x
date id
2012-01-01 1 0.1 0.2
2 0.3 0.5
3 0.4 0.8
4 0.0 0.2
2012-02-01 1 0.2 0.7
2 0.4 0.5
3 0.2 0.3
4 0.1 0.1
2012-03-01 1 0.6 0.9
2 0.7 0.5
3 0.9 0.6
4 0.4 0.5
请注意,数据框必须具有multindex集; panelOLS
根据索引确定time
和entity
效果:
Note, the dataframe must have a multindex set ; panelOLS
determines the time
and entity
effects based on the index:
>>> reg = PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
>>> reg
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x>
Number of Observations: 12
Number of Degrees of Freedom: 4
R-squared: 0.2729
Adj R-squared: 0.0002
Rmse: 0.1588
F-stat (1, 8): 1.0007, p-value: 0.3464
Degrees of Freedom: model 3, resid 8
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 0.3694 0.2132 1.73 0.1214 -0.0485 0.7872
---------------------------------End of Summary---------------------------------
文档字符串:
PanelOLS(self, y, x, weights = None, intercept = True, nw_lags = None,
entity_effects = False, time_effects = False, x_effects = None,
cluster = None, dropped_dummies = None, verbose = False,
nw_overlap = False)
Implements panel OLS.
See ols function docs
这是另一个功能(如fama_macbeth
),我认为计划将此功能移至statsmodels
.
This is another function (like fama_macbeth
) where I believe the plan is to move this functionality to statsmodels
.
这篇关于在Pandas或Statsmodels中的固定效果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!