Python中的Fama Macbeth回归(Pandas或Statsmodels) [英] Fama Macbeth Regression in Python (Pandas or Statsmodels)

查看:649
本文介绍了Python中的Fama Macbeth回归(Pandas或Statsmodels)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

计量经济学背景

Econometric Backgroud

Fama Macbeth回归是指对面板数据进行回归的过程(其中有N个不同的个体,每个个体对应于多个时期T,例如日,月,年).因此,总共有N x T obs.请注意,如果面板数据不平衡,则可以.

Fama Macbeth regression refers to a procedure to run regression for panel data (where there are N different individuals and each individual corresponds to multiple periods T, e.g. day, months,year). So in total there are N x T obs. Notice it's OK if the panel data is not balanced.

Fama Macbeth回归是对每个时期的交叉计算进行首次回归,即在给定时期t中将N个个体合并在一起.并针对t = 1,... T执行此操作.因此,总共进行了T回归.然后,对于每个独立变量,我们都有一个系数的时间序列.然后,我们可以使用系数的时间序列执行假设检验.通常我们将平均值作为每个自变量的最终系数.并且我们使用t统计量来检验其重要性.

The Fama Macbeth regression is to first run regression for each period cross-sectinally, i.e. pool N individuals together in a given period t. And do this for t=1,...T. So in total T regressions are run. Then we have a time series of coefficients for each independent variable. Then we can perform hypothesis test using the time series of coefficients. Usually we take the average as the final coefficients of each independent variable. And we use t-stats to test significance.

我的问题

My Problem

我的问题是要在熊猫中实现它.从熊猫的源代码中,我注意到有一个名为fama_macbeth的过程.但是我找不到关于此的任何文档.

My problem is to implement this in pandas. From the source code of pandas, I noticed there is a procedure called fama_macbeth. But I can't find any documentation about this.

该操作也可以通过groupby轻松完成.目前,我正在这样做:

The operation can be easily done through groupby as well. Currently I am doing this:

def fmreg(data,formula):
    return smf.ols(formula,data=data).fit().params[1]

res=df.groupby('date').apply(fmreg,'ret~var1')

这有效,res是由date索引的级数,并且Series的值为params[1],即var1的系数.但是现在我想拥有更多自变量,我需要提取所有这些自变量的系数,但是我无法弄清楚.我尝试过了

This works, res is a Series which is indexed by date and the values of Series are params[1], which is the coefficient of var1. But now I want to have more independent variables, I need to extract the coefficients of all these independent variables, but I can't figure that out. I tried this

def fmreg(data,formula):
    return smf.ols(formula,data=data).fit().params

res=df.groupby('date').apply(fmreg,'ret~var1+var2+var3')

这行不通.理想的结果是res是由date索引的数据帧,并且数据帧的每一列应包含每个变量interceptvar1var2var3的系数.

This won't work. The desired result is that res is a dataframe indexed by date, and each column of the dataframe should contain the coefficients of each variable intercept, var1, var2 and var3.

我也用statsmodels检查过,他们也没有这样的内置程序.

I also checked with statsmodels, they don't have such built-in procedure as well.

是否有任何软件包可以生成发布质量的回归表?就像Stata中的outreg2和R中的texreg一样? 谢谢你的帮助!

And is there any package that can produce publication-quality regression tables? Like outreg2 in Stata and texreg in R? Thanks for your help!

推荐答案

此更新反映了Fama-MacBeth截至2018年秋季的库情况.fama_macbeth函数已从pandas中删除了一段时间.那你有什么选择呢?

An update to reflect the library situation for Fama-MacBeth as of Fall 2018. The fama_macbeth function has been removed from pandas for a while now. So what are your options?

  1. 如果您使用的是python 3,则可以在LinearModels中使用Fama-MacBeth方法: https://github.com/bashtage/linearmodels/blob/master/linearmodels/panel/model.py

如果您使用的是python 2或不想使用LinearModels,那么最好的选择就是自己动手.

If you're using python 2 or just don't want to use LinearModels, then probably your best option is to roll you own.

例如,假设您在如下面板中拥有Fama-French行业组合(您还计算了一些变量,例如过往的beta或过往的收益用作x变量):

For example, suppose you have the Fama-French industry portfolios in a panel like the following (you've also computed some variables like past beta or past returns to use as your x-variables):

In [1]: import pandas as pd
        import numpy as np
        import statsmodels.formula.api as smf

In [4]: df = pd.read_csv('industry.csv',parse_dates=['caldt'])
        df.query("caldt == '1995-07-01'")

In [5]: Out[5]: 
      industry      caldt    ret    beta  r12to2  r36to13
18432     Aero 1995-07-01   6.26  0.9696  0.2755   0.3466
18433    Agric 1995-07-01   3.37  1.0412  0.1260   0.0581
18434    Autos 1995-07-01   2.42  1.0274  0.0293   0.2902
18435    Banks 1995-07-01   4.82  1.4985  0.1659   0.2951

Fama-MacBeth主要涉及逐月计算相同的横截面回归模型,因此您可以使用groupby实施它.您可以创建一个使用dataframe(它将来自groupby)和patsy公式的函数.然后拟合模型并返回参数估计值.这是如何实现它的准系统版本(请注意,这是几年前原始提问者试图做的事情……不确定为什么它不起作用,尽管可以追溯到statsmodels结果对象方法并未返回pandas Series,因此需要将返回值显式转换为Series ...在当前版本的pandas 0.23.4中,它确实可以正常工作:

Fama-MacBeth primarily involves computing the same cross-sectional regression model month by month, so you can implement it using a groupby. You can create a function that takes a dataframe (it will come from the groupby) and a patsy formula; it then fits the model and returns the parameter estimates. Here is a barebones version of how you could implement it (note this is what the original questioner tried to do a few years ago ... not sure why it didn't work although it's possible back then statsmodels result object method params wasn't returning a pandas Series so the return needed to be converted to a Series explicitly ... it does work fine in the current version of pandas, 0.23.4):

def ols_coef(x,formula):
    return smf.ols(formula,data=x).fit().params

In [9]: gamma = (df.groupby('caldt')
                .apply(ols_coef,'ret ~ 1 + beta + r12to2 + r36to13'))
        gamma.head()

In [10]: Out[10]: 
            Intercept      beta     r12to2   r36to13
caldt                                               
1963-07-01  -1.497012 -0.765721   4.379128 -1.918083
1963-08-01  11.144169 -6.506291   5.961584 -2.598048
1963-09-01  -2.330966 -0.741550  10.508617 -4.377293
1963-10-01   0.441941  1.127567   5.478114 -2.057173
1963-11-01   3.380485 -4.792643   3.660940 -1.210426

然后只计算均值,均值的标准误差和t检验(或所需的任何统计量).类似于以下内容:

Then just compute the mean, standard error on the mean, and a t-test (or whatever statistics you want). Something like the following:

def fm_summary(p):
    s = p.describe().T
    s['std_error'] = s['std']/np.sqrt(s['count'])
    s['tstat'] = s['mean']/s['std_error']
    return s[['mean','std_error','tstat']]

In [12]: fm_summary(gamma)
Out[12]: 
               mean  std_error     tstat
Intercept  0.754904   0.177291  4.258000
beta      -0.012176   0.202629 -0.060092
r12to2     1.794548   0.356069  5.039896
r36to13    0.237873   0.186680  1.274230

提高速度

使用statsmodels进行回归会产生大量开销(特别是考虑到您仅需要估计的系数).如果要提高效率,则可以从statsmodels切换到numpy.linalg.lstsq.编写一个执行ols估计的新函数...类似以下内容(注意,我没有做类似检查这些矩阵的等级的操作...):

Using statsmodels for the regressions has significant overhead (particularly given you only need the estimated coefficients). If you want better efficiency, then you could switch from statsmodels to numpy.linalg.lstsq. Write a new function that does the ols estimation ... something like the following (notice I'm not doing anything like checking the rank of these matrices ...):

def ols_np(data,yvar,xvar):
    gamma,_,_,_ = np.linalg.lstsq(data[xvar],data[yvar],rcond=None)
    return pd.Series(gamma)

如果您仍在使用pandas的旧版本,则可以执行以下操作:

And if you're still using an older version of pandas, the following will work:

以下是在pandas中使用fama_macbeth函数的示例:

Here is an example of using the fama_macbeth function in pandas:

>>> df

                y    x
date       id
2012-01-01 1   0.1  0.4
           2   0.3  0.6
           3   0.4  0.2
           4   0.0  1.2
2012-02-01 1   0.2  0.7
           2   0.4  0.5
           3   0.2  0.1
           4   0.1  0.0
2012-03-01 1   0.4  0.8
           2   0.6  0.1
           3   0.7  0.6
           4   0.4 -0.1

注意,结构. fama_macbeth函数希望y-var和x-vars具有一个以日期为第一个变量,以股票/公司/实体ID为第二个变量的多索引:

Notice, the structure. The fama_macbeth function expects the y-var and x-vars to have a multi-index with date as the first variable and the stock/firm/entity id as the second variable in the index:

>>> fm  = pd.fama_macbeth(y=df['y'],x=df[['x']])
>>> fm


----------------------Summary of Fama-MacBeth Analysis-------------------------

Formula: Y ~ x + intercept
# betas :   3

----------------------Summary of Estimated Coefficients------------------------
     Variable          Beta       Std Err        t-stat       CI 2.5%      CI 97.5%
          (x)       -0.0227        0.1276         -0.18       -0.2728        0.2273
  (intercept)        0.3531        0.0842          4.19        0.1881        0.5181

--------------------------------End of Summary---------------------------------

请注意,仅打印fm会调用fm.summary

Note that just printing fm calls fm.summary

>>> fm.summary

----------------------Summary of Fama-MacBeth Analysis-------------------------

Formula: Y ~ x + intercept
# betas :   3

----------------------Summary of Estimated Coefficients------------------------
     Variable          Beta       Std Err        t-stat       CI 2.5%      CI 97.5%
          (x)       -0.0227        0.1276         -0.18       -0.2728        0.2273
  (intercept)        0.3531        0.0842          4.19        0.1881        0.5181

--------------------------------End of Summary---------------------------------

此外,请注意fama_macbeth函数会自动添加一个截距(与statsmodels例程相反).另外,x-var必须为dataframe,因此,如果仅传递一列,则需要将其作为df[['x']]传递.

Also, note the fama_macbeth function automatically adds an intercept (as opposed to statsmodels routines). Also the x-var has to be a dataframe so if you pass just one column you need to pass it as df[['x']].

如果您不想拦截,则必须这样做:

If you don't want an intercept you have to do:

>>> fm  = pd.fama_macbeth(y=df['y'],x=df[['x']],intercept=False)

这篇关于Python中的Fama Macbeth回归(Pandas或Statsmodels)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆