Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta [英] Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion

查看:26
本文介绍了Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 4 列的大型数据框 df:

I have a large data frame, df, containing 4 columns:

             id           period  ret_1m   mkt_ret_1m
131146       CAN00WG0     199609 -0.1538    0.047104
133530       CAN00WG0     199610 -0.0455   -0.014143
135913       CAN00WG0     199611  0.0000    0.040926
138334       CAN00WG0     199612  0.2952    0.008723
140794       CAN00WG0     199701 -0.0257    0.039916
143274       CAN00WG0     199702 -0.0038   -0.025442
145754       CAN00WG0     199703 -0.2992   -0.049279
148246       CAN00WG0     199704 -0.0919   -0.005948
150774       CAN00WG0     199705  0.0595    0.122322
153318       CAN00WG0     199706 -0.0337    0.045765

             id           period  ret_1m   mkt_ret_1m
160980       CAN00WH0     199709  0.0757    0.079293
163569       CAN00WH0     199710 -0.0741   -0.044000
166159       CAN00WH0     199711  0.1000   -0.014644
168782       CAN00WH0     199712 -0.0909   -0.007072
171399       CAN00WH0     199801 -0.0100    0.001381
174022       CAN00WH0     199802  0.1919    0.081924
176637       CAN00WH0     199803  0.0085    0.050415
179255       CAN00WH0     199804 -0.0168    0.018393
181880       CAN00WH0     199805  0.0427   -0.051279
184516       CAN00WH0     199806 -0.0656   -0.011516

             id           period  ret_1m   mkt_ret_1m
143275       CAN00WO0     199702 -0.1176   -0.025442
145755       CAN00WO0     199703 -0.0074   -0.049279
148247       CAN00WO0     199704 -0.0075   -0.005948
150775       CAN00WO0     199705  0.0451    0.122322

我正在尝试使用一个函数计算一个常见的财务指标,称为 beta,该函数采用两列,ret_1m,每月的 stock_return 和 ret_1m_mkt,即同一时期 (period_id) 的市场 1 个月回报.我想应用一个函数 (calc_beta) 来计算这个函数在 12 个月滚动基础上的 12 个月结果.

I am attempting to calculate a common financial measure, known as beta, using a function, that takes two of the columns, ret_1m, the monthly stock_return, and ret_1m_mkt, the market 1 month return for the same period (period_id). I want to apply a function (calc_beta) to calculate the 12-month result of this function on a 12 month rolling basis.

为此,我创建了一个 groupby 对象:

To do this, I am creating a groupby object:

grp = df.groupby('id')

我想做的是使用类似的东西:

What I would like to do is use something like:

period = 12
for stock, sub_df in grp:
    arg = sub_df[['ret_1m', 'mkt_ret_1m']]
    beta = pd.rolling_apply(arg, period, calc_beta, min_periods = period)

现在,这是第一个问题.根据文档, pd.rolling_apply arg 可以是系列或数据框.但是,我提供的数据框似乎被转换为一个只能包含一列数据的 numpy 数组,而不是我试图提供的两列数据.所以我下面的 calc_beta 代码将不起作用,因为我需要同时传递股票和市场回报:

Now, here is the first problem. According to the documentation, pd.rolling_apply arg can be either a series or a data frame. However, it appears that the data frame I supply is converted into a numpy array that can only contain one column of data, rather than the two I have tried to supply. So my code below for calc_beta will not work, because I need to pass both the stock and market returns:

def calc_beta(np_array)
    s = np_array[:,0] # stock returns are column zero from numpy array
    m = np_array[:,1] # market returns are column one from numpy array

    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
return beta

所以我的问题如下,我认为这样列出它们是有道理的:

So my questions are as follows, I think it makes sense to list them in this way:

(i)  How can I pass a data frame/multiple series/numpy array with more than one column to calc_beta using rolling_apply?
(ii) How can I return more than one value (e.g. the beta) from the calc_beta function? 
(iii) Having calculated rolling quantities, how can I recombined with the original dataframe df so that I have the rolling quantities corresponding to the correct date in the period column?
(iv) Is there a better (vectorized) way of achieving this?  I have seen some similar questions using e.g. df.apply(pd.rolling_apply,period,??) but I did not understand how these worked.

我认为rolling_apply 以前无法处理数据帧,但文档表明它现在能够这样做.我的熊猫.版本是 0.16.1.

I gather that rolling_apply previously was unable to handle data frames, but the documentations suggests that it is now able to do so. My pandas.version is 0.16.1.

感谢您的帮助!我花了 1.5 天的时间试图弄清楚这一点,但完全被难住了.

Thanks for any help! I have lost 1.5 days trying to figure this out and am totally stumped.

最终,我想要的是这样的:

Ultimately, what I want is something like this:

             id           period  ret_1m   mkt_ret_1m  beta  other_quantities
131146       CAN00WG0     199609 -0.1538    0.047104  0.521  xxx
133530       CAN00WG0     199610 -0.0455   -0.014143  0.627  xxxx
135913       CAN00WG0     199611  0.0000    0.040926  0.341  xxx
138334       CAN00WG0     199612  0.2952    0.008723  0.567  xx
140794       CAN00WG0     199701 -0.0257    0.039916  0.4612 xxx
143274       CAN00WG0     199702 -0.0038   -0.025442  0.215  xxx
145754       CAN00WG0     199703 -0.2992   -0.049279  0.4678  xxx
148246       CAN00WG0     199704 -0.0919   -0.005948  -0.4225  xxx
150774       CAN00WG0     199705  0.0595    0.122322  0.780  xxx
153318       CAN00WG0     199706 -0.0337    0.045765  0.623  xxx

             id           period  ret_1m   mkt_ret_1m  beta  other_quantities
160980       CAN00WH0     199709  0.0757    0.079293  -0.913  xx
163569       CAN00WH0     199710 -0.0741   -0.044000  0.894  xxx
166159       CAN00WH0     199711  0.1000   -0.014644  0.563  xxx
168782       CAN00WH0     199712 -0.0909   -0.007072  0.734  xxx
171399       CAN00WH0     199801 -0.0100    0.001381  0.894  xxxx
174022       CAN00WH0     199802  0.1919    0.081924  0.789  xx
176637       CAN00WH0     199803  0.0085    0.050415  0.1563  xxxx
179255       CAN00WH0     199804 -0.0168    0.018393  -0.64  xxxx
181880       CAN00WH0     199805  0.0427   -0.051279  -0.742  xxx
184516       CAN00WH0     199806 -0.0656   -0.011516  0.925  xxx

             id           period  ret_1m   mkt_ret_1m  beta
143275       CAN00WO0     199702 -0.1176   -0.025442  -1.52  xx
145755       CAN00WO0     199703 -0.0074   -0.049279  -0.632  xxx
148247       CAN00WO0     199704 -0.0075   -0.005948  1.521  xx
150775       CAN00WO0     199705  0.0451    0.122322  0.0321  xxx

推荐答案

def rolling_apply(df, period, func, min_periods=None):
    if min_periods is None:
        min_periods = period
    result = pd.Series(np.nan, index=df.index)

    for i in range(1, len(df)):
        sub_df = df.iloc[max(i-period, 0):i,:] #get a subsample to run
        if len(sub_df) >= min_periods:
            idx = sub_df.index[-1]+1 # mind the forward looking bias,your return in time t should not be inclued in the beta calculating in time t
            result[idx] = func(sub_df)
    return result

我修正了 Happy001 的代码的前瞻性偏差.这是一个财务问题,所以应该谨慎.

I fix a forward looking bias for Happy001's code. It's a finance problem, so it should be cautious.

我发现 vlmercado 的回答太错误了.如果您只是使用 pd.rolling_cov 和 pd.rolling_var,那么您就在金融领域犯了错误.首先,很明显,第二只股票 CAN00WH0 没有任何 NaN beta,因为它使用了 CAN00WG0 的回报,这是完全错误的.其次,考虑这样一种情况:一只股票停牌十年,你也可以把这个样本纳入你的贝塔计算中.

I find that vlmercado's answer is so wrong. If you simply use pd.rolling_cov and pd.rolling_var you are making mistakes in finance. Firstly, it's obvious that the second stock CAN00WH0 do not have any NaN beta, since it use the return of CAN00WG0, which is wrong at all. Secondly, consider such a situation: a stock suspended for ten years, and you can also get that sample into your beta calculating.

我发现 pandas.rolling 也适用于 Timestamp,如果有兴趣,您可以在我上面的回答中看到如何.我更改了 Happy001 的代码 的代码.这不是最快的方式,但至少比原始代码快 20 倍.

I find that pandas.rolling also works for Timestamp, you can see how in my answer above if interested. I change the code of Happy001's code . It's not the fastest way, but is at least 20x faster than the origin code.

crsp_daily['date']=pd.to_datetime(crsp_daily['date'])
crsp_daily=crsp_daily.set_index('date') # rolling needs a time serie index
crsp_daily.index=pd.DatetimeIndex(crsp_daily.index)
calc=crsp_daily[['permno','ret','mkt_ret']]
grp = calc.groupby('permno') #rolling beta for each stock
beta=pd.DataFrame()
for stock, sub_df in grp:
        sub2_df=sub_df[['ret','mkt_ret']].sort_index() 
        beta_m = sub2_df.rolling('1825d',min_periods=150).cov() # 5yr rolling beta , note that d for day, and you cannot use w/m/y, s/d are availiable.
        beta_m['beta']=beta_m['ret']/beta_m['mkt_ret']
        beta_m=beta_m.xs('mkt_ret',level=1,axis=0)
        beta=beta.append(pd.merge(sub_df,pd.DataFrame(beta_m['beta'])))
beta=beta.reset_index()
beta=beta[['date','permno','beta']]

这篇关于Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆