Python中的Fama Macbeth回归(Pandas或Statsmodels) [英] Fama Macbeth Regression in Python (Pandas or Statsmodels)
问题描述
计量经济学背景
Econometric Backgroud
Fama Macbeth回归是指对面板数据进行回归的过程(其中有N个不同的个体,每个个体对应于多个时期T,例如日,月,年).因此,总共有N x T obs.请注意,如果面板数据不平衡,则可以.
Fama Macbeth regression refers to a procedure to run regression for panel data (where there are N different individuals and each individual corresponds to multiple periods T, e.g. day, months,year). So in total there are N x T obs. Notice it's OK if the panel data is not balanced.
Fama Macbeth回归是对每个时期的交叉计算进行首次回归,即在给定时期t中将N个个体合并在一起.并针对t = 1,... T执行此操作.因此,总共进行了T回归.然后,对于每个独立变量,我们都有一个系数的时间序列.然后,我们可以使用系数的时间序列执行假设检验.通常我们将平均值作为每个自变量的最终系数.并且我们使用t统计量来检验其重要性.
The Fama Macbeth regression is to first run regression for each period cross-sectinally, i.e. pool N individuals together in a given period t. And do this for t=1,...T. So in total T regressions are run. Then we have a time series of coefficients for each independent variable. Then we can perform hypothesis test using the time series of coefficients. Usually we take the average as the final coefficients of each independent variable. And we use t-stats to test significance.
我的问题
My Problem
我的问题是要在熊猫中实现它.从熊猫的源代码中,我注意到有一个名为fama_macbeth
的过程.但是我找不到关于此的任何文档.
My problem is to implement this in pandas. From the source code of pandas, I noticed there is a procedure called fama_macbeth
. But I can't find any documentation about this.
该操作也可以通过groupby
轻松完成.目前,我正在这样做:
The operation can be easily done through groupby
as well. Currently I am doing this:
def fmreg(data,formula):
return smf.ols(formula,data=data).fit().params[1]
res=df.groupby('date').apply(fmreg,'ret~var1')
这有效,res
是由date
索引的级数,并且Series的值为params[1]
,即var1
的系数.但是现在我想拥有更多自变量,我需要提取所有这些自变量的系数,但是我无法弄清楚.我尝试过了
This works, res
is a Series which is indexed by date
and the values of Series are params[1]
, which is the coefficient of var1
. But now I want to have more independent variables, I need to extract the coefficients of all these independent variables, but I can't figure that out. I tried this
def fmreg(data,formula):
return smf.ols(formula,data=data).fit().params
res=df.groupby('date').apply(fmreg,'ret~var1+var2+var3')
这行不通.理想的结果是res
是由date
索引的数据帧,并且数据帧的每一列应包含每个变量intercept
,var1
,var2
和var3
的系数.
This won't work. The desired result is that res
is a dataframe indexed by date
, and each column of the dataframe should contain the coefficients of each variable intercept
, var1
, var2
and var3
.
我也用statsmodels
检查过,他们也没有这样的内置程序.
I also checked with statsmodels
, they don't have such built-in procedure as well.
是否有任何软件包可以生成发布质量的回归表?就像Stata中的outreg2
和R中的texreg
一样?
谢谢你的帮助!
And is there any package that can produce publication-quality regression tables? Like outreg2
in Stata and texreg
in R?
Thanks for your help!
推荐答案
此更新反映了Fama-MacBeth截至2018年秋季的库情况.fama_macbeth
函数已从pandas
中删除了一段时间.那你有什么选择呢?
An update to reflect the library situation for Fama-MacBeth as of Fall 2018. The fama_macbeth
function has been removed from pandas
for a while now. So what are your options?
-
如果您使用的是python 3,则可以在LinearModels中使用Fama-MacBeth方法: https://github.com/bashtage/linearmodels/blob/master/linearmodels/panel/model.py
如果您使用的是python 2或不想使用LinearModels,那么最好的选择就是自己动手.
If you're using python 2 or just don't want to use LinearModels, then probably your best option is to roll you own.
例如,假设您在如下面板中拥有Fama-French行业组合(您还计算了一些变量,例如过往的beta或过往的收益用作x变量):
For example, suppose you have the Fama-French industry portfolios in a panel like the following (you've also computed some variables like past beta or past returns to use as your x-variables):
In [1]: import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
In [4]: df = pd.read_csv('industry.csv',parse_dates=['caldt'])
df.query("caldt == '1995-07-01'")
In [5]: Out[5]:
industry caldt ret beta r12to2 r36to13
18432 Aero 1995-07-01 6.26 0.9696 0.2755 0.3466
18433 Agric 1995-07-01 3.37 1.0412 0.1260 0.0581
18434 Autos 1995-07-01 2.42 1.0274 0.0293 0.2902
18435 Banks 1995-07-01 4.82 1.4985 0.1659 0.2951
Fama-MacBeth主要涉及逐月计算相同的横截面回归模型,因此您可以使用groupby
实施它.您可以创建一个使用dataframe
(它将来自groupby
)和patsy
公式的函数.然后拟合模型并返回参数估计值.这是如何实现它的准系统版本(请注意,这是几年前原始提问者试图做的事情……不确定为什么它不起作用,尽管可以追溯到statsmodels
结果对象方法pandas
Series
,因此需要将返回值显式转换为Series
...在当前版本的pandas
0.23.4中,它确实可以正常工作:
Fama-MacBeth primarily involves computing the same cross-sectional regression model month by month, so you can implement it using a groupby
. You can create a function that takes a dataframe
(it will come from the groupby
) and a patsy
formula; it then fits the model and returns the parameter estimates. Here is a barebones version of how you could implement it (note this is what the original questioner tried to do a few years ago ... not sure why it didn't work although it's possible back then statsmodels
result object method params
wasn't returning a pandas
Series
so the return needed to be converted to a Series
explicitly ... it does work fine in the current version of pandas
, 0.23.4):
def ols_coef(x,formula):
return smf.ols(formula,data=x).fit().params
In [9]: gamma = (df.groupby('caldt')
.apply(ols_coef,'ret ~ 1 + beta + r12to2 + r36to13'))
gamma.head()
In [10]: Out[10]:
Intercept beta r12to2 r36to13
caldt
1963-07-01 -1.497012 -0.765721 4.379128 -1.918083
1963-08-01 11.144169 -6.506291 5.961584 -2.598048
1963-09-01 -2.330966 -0.741550 10.508617 -4.377293
1963-10-01 0.441941 1.127567 5.478114 -2.057173
1963-11-01 3.380485 -4.792643 3.660940 -1.210426
然后只计算均值,均值的标准误差和t检验(或所需的任何统计量).类似于以下内容:
Then just compute the mean, standard error on the mean, and a t-test (or whatever statistics you want). Something like the following:
def fm_summary(p):
s = p.describe().T
s['std_error'] = s['std']/np.sqrt(s['count'])
s['tstat'] = s['mean']/s['std_error']
return s[['mean','std_error','tstat']]
In [12]: fm_summary(gamma)
Out[12]:
mean std_error tstat
Intercept 0.754904 0.177291 4.258000
beta -0.012176 0.202629 -0.060092
r12to2 1.794548 0.356069 5.039896
r36to13 0.237873 0.186680 1.274230
提高速度
使用statsmodels
进行回归会产生大量开销(特别是考虑到您仅需要估计的系数).如果要提高效率,则可以从statsmodels
切换到numpy.linalg.lstsq
.编写一个执行ols估计的新函数...类似以下内容(注意,我没有做类似检查这些矩阵的等级的操作...):
Using statsmodels
for the regressions has significant overhead (particularly given you only need the estimated coefficients). If you want better efficiency, then you could switch from statsmodels
to numpy.linalg.lstsq
. Write a new function that does the ols estimation ... something like the following (notice I'm not doing anything like checking the rank of these matrices ...):
def ols_np(data,yvar,xvar):
gamma,_,_,_ = np.linalg.lstsq(data[xvar],data[yvar],rcond=None)
return pd.Series(gamma)
如果您仍在使用pandas
的旧版本,则可以执行以下操作:
And if you're still using an older version of pandas
, the following will work:
以下是在pandas
中使用fama_macbeth
函数的示例:
Here is an example of using the fama_macbeth
function in pandas
:
>>> df
y x
date id
2012-01-01 1 0.1 0.4
2 0.3 0.6
3 0.4 0.2
4 0.0 1.2
2012-02-01 1 0.2 0.7
2 0.4 0.5
3 0.2 0.1
4 0.1 0.0
2012-03-01 1 0.4 0.8
2 0.6 0.1
3 0.7 0.6
4 0.4 -0.1
注意,结构. fama_macbeth
函数希望y-var和x-vars具有一个以日期为第一个变量,以股票/公司/实体ID为第二个变量的多索引:
Notice, the structure. The fama_macbeth
function expects the y-var and x-vars to have a multi-index with date as the first variable and the stock/firm/entity id as the second variable in the index:
>>> fm = pd.fama_macbeth(y=df['y'],x=df[['x']])
>>> fm
----------------------Summary of Fama-MacBeth Analysis-------------------------
Formula: Y ~ x + intercept
# betas : 3
----------------------Summary of Estimated Coefficients------------------------
Variable Beta Std Err t-stat CI 2.5% CI 97.5%
(x) -0.0227 0.1276 -0.18 -0.2728 0.2273
(intercept) 0.3531 0.0842 4.19 0.1881 0.5181
--------------------------------End of Summary---------------------------------
请注意,仅打印fm
会调用fm.summary
Note that just printing fm
calls fm.summary
>>> fm.summary
----------------------Summary of Fama-MacBeth Analysis-------------------------
Formula: Y ~ x + intercept
# betas : 3
----------------------Summary of Estimated Coefficients------------------------
Variable Beta Std Err t-stat CI 2.5% CI 97.5%
(x) -0.0227 0.1276 -0.18 -0.2728 0.2273
(intercept) 0.3531 0.0842 4.19 0.1881 0.5181
--------------------------------End of Summary---------------------------------
此外,请注意fama_macbeth
函数会自动添加一个截距(与statsmodels
例程相反).另外,x-var必须为dataframe
,因此,如果仅传递一列,则需要将其作为df[['x']]
传递.
Also, note the fama_macbeth
function automatically adds an intercept (as opposed to statsmodels
routines). Also the x-var has to be a dataframe
so if you pass just one column you need to pass it as df[['x']]
.
如果您不想拦截,则必须这样做:
If you don't want an intercept you have to do:
>>> fm = pd.fama_macbeth(y=df['y'],x=df[['x']],intercept=False)
这篇关于Python中的Fama Macbeth回归(Pandas或Statsmodels)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!