在Pandas或Statsmodels中的固定效果 [英] Fixed effect in Pandas or Statsmodels

查看:87
本文介绍了在Pandas或Statsmodels中的固定效果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在从熊猫或Stats模型估算固定效果(单向或双向)的功能.

Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels.

Statsmodels中曾经有一个函数,但似乎已停产.在Pandas中,有一个叫做plm的东西,但是我不能导入它或使用pd.plm()运行它.

There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is something called plm, but I can't import it or run it using pd.plm().

推荐答案

如注释中所述,从0.20.0版本开始,PanelOLS已从Pandas中删除.因此,您确实有三个选择:

As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options:

  1. 如果您使用Python 3,则可以按照最新答案中的说明使用linearmodels: https://stackoverflow .com/a/44836199/3435183

只需在statsmodels规范中指定各种虚拟对象,例如使用 pd.get_dummies .如果固定效果的数量很多,可能不可行.

Just specify various dummies in your statsmodels specification, e.g. using pd.get_dummies. May not be feasible if the number of fixed effects is large.

或执行一些基于分组的贬义,然后使用statsmodels(如果您要估计很多固定效果,这将起作用).这是您可以通过一种固定方式完成的工作的准系统版本:

Or do some groupby based demeaning and then use statsmodels (this would work if you're estimating lots of fixed effects). Here is a barebones version of what you could do for one way fixed effects:

def areg(formula,data=None,absorb=None,cluster=None): 

    y,X = patsy.dmatrices(formula,data,return_type='dataframe')

    ybar = y.mean()
    y = y -  y.groupby(data[absorb]).transform('mean') + ybar

    Xbar = X.mean()
    X = X - X.groupby(data[absorb]).transform('mean') + Xbar

    reg = sm.OLS(y,X)
    # Account for df loss from FE transform
    reg.df_resid -= (data[absorb].nunique() - 1)

    return reg.fit(cov_type='cluster',cov_kwds={'groups':data[cluster].values})

这是使用旧版本的Pandas时可以执行的操作:

And here is what you can do if using an older version of Pandas:

一个使用pandas的PanelOLS(位于plm模块中)的具有时间固定效果的示例.注意,PanelOLS的导入:

An example with time fixed effects using pandas' PanelOLS (which is in the plm module). Notice, the import of PanelOLS:

>>> from pandas.stats.plm import PanelOLS
>>> df

                y    x
date       id
2012-01-01 1   0.1  0.2
           2   0.3  0.5
           3   0.4  0.8
           4   0.0  0.2
2012-02-01 1   0.2  0.7 
           2   0.4  0.5
           3   0.2  0.3
           4   0.1  0.1
2012-03-01 1   0.6  0.9
           2   0.7  0.5
           3   0.9  0.6
           4   0.4  0.5

请注意,数据框必须具有multindex集; panelOLS根据索引确定timeentity效果:

Note, the dataframe must have a multindex set ; panelOLS determines the time and entity effects based on the index:

>>> reg  = PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
>>> reg

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x>

Number of Observations:         12
Number of Degrees of Freedom:   4

R-squared:         0.2729
Adj R-squared:     0.0002

Rmse:              0.1588

F-stat (1, 8):     1.0007, p-value:     0.3464

Degrees of Freedom: model 3, resid 8

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     0.3694     0.2132       1.73     0.1214    -0.0485     0.7872
---------------------------------End of Summary--------------------------------- 

文档字符串:

PanelOLS(self, y, x, weights = None, intercept = True, nw_lags = None,
entity_effects = False, time_effects = False, x_effects = None,
cluster = None, dropped_dummies = None, verbose = False,
nw_overlap = False)

Implements panel OLS.

See ols function docs

这是另一个功能(如fama_macbeth),我认为计划将此功能移至statsmodels.

This is another function (like fama_macbeth) where I believe the plan is to move this functionality to statsmodels.

这篇关于在Pandas或Statsmodels中的固定效果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆