Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta [英] Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion
问题描述
我有一个包含 4 列的大型数据框 df:
I have a large data frame, df, containing 4 columns:
id period ret_1m mkt_ret_1m
131146 CAN00WG0 199609 -0.1538 0.047104
133530 CAN00WG0 199610 -0.0455 -0.014143
135913 CAN00WG0 199611 0.0000 0.040926
138334 CAN00WG0 199612 0.2952 0.008723
140794 CAN00WG0 199701 -0.0257 0.039916
143274 CAN00WG0 199702 -0.0038 -0.025442
145754 CAN00WG0 199703 -0.2992 -0.049279
148246 CAN00WG0 199704 -0.0919 -0.005948
150774 CAN00WG0 199705 0.0595 0.122322
153318 CAN00WG0 199706 -0.0337 0.045765
id period ret_1m mkt_ret_1m
160980 CAN00WH0 199709 0.0757 0.079293
163569 CAN00WH0 199710 -0.0741 -0.044000
166159 CAN00WH0 199711 0.1000 -0.014644
168782 CAN00WH0 199712 -0.0909 -0.007072
171399 CAN00WH0 199801 -0.0100 0.001381
174022 CAN00WH0 199802 0.1919 0.081924
176637 CAN00WH0 199803 0.0085 0.050415
179255 CAN00WH0 199804 -0.0168 0.018393
181880 CAN00WH0 199805 0.0427 -0.051279
184516 CAN00WH0 199806 -0.0656 -0.011516
id period ret_1m mkt_ret_1m
143275 CAN00WO0 199702 -0.1176 -0.025442
145755 CAN00WO0 199703 -0.0074 -0.049279
148247 CAN00WO0 199704 -0.0075 -0.005948
150775 CAN00WO0 199705 0.0451 0.122322
等
我正在尝试使用一个函数计算一个常见的财务指标,称为 beta,该函数采用两列,ret_1m,每月的 stock_return 和 ret_1m_mkt,即同一时期 (period_id) 的市场 1 个月回报.我想应用一个函数 (calc_beta) 来计算这个函数在 12 个月滚动基础上的 12 个月结果.
I am attempting to calculate a common financial measure, known as beta, using a function, that takes two of the columns, ret_1m, the monthly stock_return, and ret_1m_mkt, the market 1 month return for the same period (period_id). I want to apply a function (calc_beta) to calculate the 12-month result of this function on a 12 month rolling basis.
为此,我创建了一个 groupby 对象:
To do this, I am creating a groupby object:
grp = df.groupby('id')
我想做的是使用类似的东西:
What I would like to do is use something like:
period = 12
for stock, sub_df in grp:
arg = sub_df[['ret_1m', 'mkt_ret_1m']]
beta = pd.rolling_apply(arg, period, calc_beta, min_periods = period)
现在,这是第一个问题.根据文档, pd.rolling_apply arg 可以是系列或数据框.但是,我提供的数据框似乎被转换为一个只能包含一列数据的 numpy 数组,而不是我试图提供的两列数据.所以我下面的 calc_beta 代码将不起作用,因为我需要同时传递股票和市场回报:
Now, here is the first problem. According to the documentation, pd.rolling_apply arg can be either a series or a data frame. However, it appears that the data frame I supply is converted into a numpy array that can only contain one column of data, rather than the two I have tried to supply. So my code below for calc_beta will not work, because I need to pass both the stock and market returns:
def calc_beta(np_array)
s = np_array[:,0] # stock returns are column zero from numpy array
m = np_array[:,1] # market returns are column one from numpy array
covariance = np.cov(s,m) # Calculate covariance between stock and market
beta = covariance[0,1]/covariance[1,1]
return beta
所以我的问题如下,我认为这样列出它们是有道理的:
So my questions are as follows, I think it makes sense to list them in this way:
(i) How can I pass a data frame/multiple series/numpy array with more than one column to calc_beta using rolling_apply?
(ii) How can I return more than one value (e.g. the beta) from the calc_beta function?
(iii) Having calculated rolling quantities, how can I recombined with the original dataframe df so that I have the rolling quantities corresponding to the correct date in the period column?
(iv) Is there a better (vectorized) way of achieving this? I have seen some similar questions using e.g. df.apply(pd.rolling_apply,period,??) but I did not understand how these worked.
我认为rolling_apply 以前无法处理数据帧,但文档表明它现在能够这样做.我的熊猫.版本是 0.16.1.
I gather that rolling_apply previously was unable to handle data frames, but the documentations suggests that it is now able to do so. My pandas.version is 0.16.1.
感谢您的帮助!我花了 1.5 天的时间试图弄清楚这一点,但完全被难住了.
Thanks for any help! I have lost 1.5 days trying to figure this out and am totally stumped.
最终,我想要的是这样的:
Ultimately, what I want is something like this:
id period ret_1m mkt_ret_1m beta other_quantities
131146 CAN00WG0 199609 -0.1538 0.047104 0.521 xxx
133530 CAN00WG0 199610 -0.0455 -0.014143 0.627 xxxx
135913 CAN00WG0 199611 0.0000 0.040926 0.341 xxx
138334 CAN00WG0 199612 0.2952 0.008723 0.567 xx
140794 CAN00WG0 199701 -0.0257 0.039916 0.4612 xxx
143274 CAN00WG0 199702 -0.0038 -0.025442 0.215 xxx
145754 CAN00WG0 199703 -0.2992 -0.049279 0.4678 xxx
148246 CAN00WG0 199704 -0.0919 -0.005948 -0.4225 xxx
150774 CAN00WG0 199705 0.0595 0.122322 0.780 xxx
153318 CAN00WG0 199706 -0.0337 0.045765 0.623 xxx
id period ret_1m mkt_ret_1m beta other_quantities
160980 CAN00WH0 199709 0.0757 0.079293 -0.913 xx
163569 CAN00WH0 199710 -0.0741 -0.044000 0.894 xxx
166159 CAN00WH0 199711 0.1000 -0.014644 0.563 xxx
168782 CAN00WH0 199712 -0.0909 -0.007072 0.734 xxx
171399 CAN00WH0 199801 -0.0100 0.001381 0.894 xxxx
174022 CAN00WH0 199802 0.1919 0.081924 0.789 xx
176637 CAN00WH0 199803 0.0085 0.050415 0.1563 xxxx
179255 CAN00WH0 199804 -0.0168 0.018393 -0.64 xxxx
181880 CAN00WH0 199805 0.0427 -0.051279 -0.742 xxx
184516 CAN00WH0 199806 -0.0656 -0.011516 0.925 xxx
id period ret_1m mkt_ret_1m beta
143275 CAN00WO0 199702 -0.1176 -0.025442 -1.52 xx
145755 CAN00WO0 199703 -0.0074 -0.049279 -0.632 xxx
148247 CAN00WO0 199704 -0.0075 -0.005948 1.521 xx
150775 CAN00WO0 199705 0.0451 0.122322 0.0321 xxx
等
推荐答案
def rolling_apply(df, period, func, min_periods=None):
if min_periods is None:
min_periods = period
result = pd.Series(np.nan, index=df.index)
for i in range(1, len(df)):
sub_df = df.iloc[max(i-period, 0):i,:] #get a subsample to run
if len(sub_df) >= min_periods:
idx = sub_df.index[-1]+1 # mind the forward looking bias,your return in time t should not be inclued in the beta calculating in time t
result[idx] = func(sub_df)
return result
我修正了 Happy001 的代码的前瞻性偏差.这是一个财务问题,所以应该谨慎.
I fix a forward looking bias for Happy001's code. It's a finance problem, so it should be cautious.
我发现 vlmercado 的回答太错误了.如果您只是使用 pd.rolling_cov 和 pd.rolling_var,那么您就在金融领域犯了错误.首先,很明显,第二只股票 CAN00WH0 没有任何 NaN beta,因为它使用了 CAN00WG0 的回报,这是完全错误的.其次,考虑这样一种情况:一只股票停牌十年,你也可以把这个样本纳入你的贝塔计算中.
I find that vlmercado's answer is so wrong. If you simply use pd.rolling_cov and pd.rolling_var you are making mistakes in finance. Firstly, it's obvious that the second stock CAN00WH0 do not have any NaN beta, since it use the return of CAN00WG0, which is wrong at all. Secondly, consider such a situation: a stock suspended for ten years, and you can also get that sample into your beta calculating.
我发现 pandas.rolling 也适用于 Timestamp,如果有兴趣,您可以在我上面的回答中看到如何.我更改了 Happy001 的代码 的代码.这不是最快的方式,但至少比原始代码快 20 倍.
I find that pandas.rolling also works for Timestamp, you can see how in my answer above if interested. I change the code of Happy001's code . It's not the fastest way, but is at least 20x faster than the origin code.
crsp_daily['date']=pd.to_datetime(crsp_daily['date'])
crsp_daily=crsp_daily.set_index('date') # rolling needs a time serie index
crsp_daily.index=pd.DatetimeIndex(crsp_daily.index)
calc=crsp_daily[['permno','ret','mkt_ret']]
grp = calc.groupby('permno') #rolling beta for each stock
beta=pd.DataFrame()
for stock, sub_df in grp:
sub2_df=sub_df[['ret','mkt_ret']].sort_index()
beta_m = sub2_df.rolling('1825d',min_periods=150).cov() # 5yr rolling beta , note that d for day, and you cannot use w/m/y, s/d are availiable.
beta_m['beta']=beta_m['ret']/beta_m['mkt_ret']
beta_m=beta_m.xs('mkt_ret',level=1,axis=0)
beta=beta.append(pd.merge(sub_df,pd.DataFrame(beta_m['beta'])))
beta=beta.reset_index()
beta=beta[['date','permno','beta']]
这篇关于Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!