Python pandas 使用滚动矢量化方式将其应用于groupby对象来计算机车车辆的beta [英] Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion

查看:45
本文介绍了Python pandas 使用滚动矢量化方式将其应用于groupby对象来计算机车车辆的beta的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大数据框df,其中包含4列:

I have a large data frame, df, containing 4 columns:

             id           period  ret_1m   mkt_ret_1m
131146       CAN00WG0     199609 -0.1538    0.047104
133530       CAN00WG0     199610 -0.0455   -0.014143
135913       CAN00WG0     199611  0.0000    0.040926
138334       CAN00WG0     199612  0.2952    0.008723
140794       CAN00WG0     199701 -0.0257    0.039916
143274       CAN00WG0     199702 -0.0038   -0.025442
145754       CAN00WG0     199703 -0.2992   -0.049279
148246       CAN00WG0     199704 -0.0919   -0.005948
150774       CAN00WG0     199705  0.0595    0.122322
153318       CAN00WG0     199706 -0.0337    0.045765

             id           period  ret_1m   mkt_ret_1m
160980       CAN00WH0     199709  0.0757    0.079293
163569       CAN00WH0     199710 -0.0741   -0.044000
166159       CAN00WH0     199711  0.1000   -0.014644
168782       CAN00WH0     199712 -0.0909   -0.007072
171399       CAN00WH0     199801 -0.0100    0.001381
174022       CAN00WH0     199802  0.1919    0.081924
176637       CAN00WH0     199803  0.0085    0.050415
179255       CAN00WH0     199804 -0.0168    0.018393
181880       CAN00WH0     199805  0.0427   -0.051279
184516       CAN00WH0     199806 -0.0656   -0.011516

             id           period  ret_1m   mkt_ret_1m
143275       CAN00WO0     199702 -0.1176   -0.025442
145755       CAN00WO0     199703 -0.0074   -0.049279
148247       CAN00WO0     199704 -0.0075   -0.005948
150775       CAN00WO0     199705  0.0451    0.122322

我正在尝试使用一个函数来计算一种称为b的通用财务指标,该函数使用两个列ret_1m(每月stock_return)和ret_1m_mkt(同一时期的市场1个月回报)(period_id).我想应用一个函数(calc_beta)在12个月的滚动基础上计算该函数的12个月结果.

I am attempting to calculate a common financial measure, known as beta, using a function, that takes two of the columns, ret_1m, the monthly stock_return, and ret_1m_mkt, the market 1 month return for the same period (period_id). I want to apply a function (calc_beta) to calculate the 12-month result of this function on a 12 month rolling basis.

为此,我正在创建一个groupby对象:

To do this, I am creating a groupby object:

grp = df.groupby('id')

我想做的是使用类似的东西:

What I would like to do is use something like:

period = 12
for stock, sub_df in grp:
    arg = sub_df[['ret_1m', 'mkt_ret_1m']]
    beta = pd.rolling_apply(arg, period, calc_beta, min_periods = period)

现在,这是第一个问题.根据文档,pd.rolling_apply arg可以是序列或数据框.但是,看来我提供的数据帧已转换为仅包含一列数据而不是我尝试提供的两列数据的numpy数组.因此,我下面的calc_beta代码无法正常工作,因为我需要同时传递股票和市场收益:

Now, here is the first problem. According to the documentation, pd.rolling_apply arg can be either a series or a data frame. However, it appears that the data frame I supply is converted into a numpy array that can only contain one column of data, rather than the two I have tried to supply. So my code below for calc_beta will not work, because I need to pass both the stock and market returns:

def calc_beta(np_array)
    s = np_array[:,0] # stock returns are column zero from numpy array
    m = np_array[:,1] # market returns are column one from numpy array

    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
return beta

所以我的问题如下,我认为以这种方式列出它们是有道理的:

So my questions are as follows, I think it makes sense to list them in this way:

(i)  How can I pass a data frame/multiple series/numpy array with more than one column to calc_beta using rolling_apply?
(ii) How can I return more than one value (e.g. the beta) from the calc_beta function? 
(iii) Having calculated rolling quantities, how can I recombined with the original dataframe df so that I have the rolling quantities corresponding to the correct date in the period column?
(iv) Is there a better (vectorized) way of achieving this?  I have seen some similar questions using e.g. df.apply(pd.rolling_apply,period,??) but I did not understand how these worked.

我收集到以前rolling_apply无法处理数据帧,但是文档表明它现在可以处理数据帧.我的熊猫.版本是0.16.1.

I gather that rolling_apply previously was unable to handle data frames, but the documentations suggests that it is now able to do so. My pandas.version is 0.16.1.

感谢您的帮助!我已经花了1.5天的时间试图弄清楚这一点,但完全陷入了困境.

Thanks for any help! I have lost 1.5 days trying to figure this out and am totally stumped.

最终,我想要的是这样的东西:

Ultimately, what I want is something like this:

             id           period  ret_1m   mkt_ret_1m  beta  other_quantities
131146       CAN00WG0     199609 -0.1538    0.047104  0.521  xxx
133530       CAN00WG0     199610 -0.0455   -0.014143  0.627  xxxx
135913       CAN00WG0     199611  0.0000    0.040926  0.341  xxx
138334       CAN00WG0     199612  0.2952    0.008723  0.567  xx
140794       CAN00WG0     199701 -0.0257    0.039916  0.4612 xxx
143274       CAN00WG0     199702 -0.0038   -0.025442  0.215  xxx
145754       CAN00WG0     199703 -0.2992   -0.049279  0.4678  xxx
148246       CAN00WG0     199704 -0.0919   -0.005948  -0.4225  xxx
150774       CAN00WG0     199705  0.0595    0.122322  0.780  xxx
153318       CAN00WG0     199706 -0.0337    0.045765  0.623  xxx

             id           period  ret_1m   mkt_ret_1m  beta  other_quantities
160980       CAN00WH0     199709  0.0757    0.079293  -0.913  xx
163569       CAN00WH0     199710 -0.0741   -0.044000  0.894  xxx
166159       CAN00WH0     199711  0.1000   -0.014644  0.563  xxx
168782       CAN00WH0     199712 -0.0909   -0.007072  0.734  xxx
171399       CAN00WH0     199801 -0.0100    0.001381  0.894  xxxx
174022       CAN00WH0     199802  0.1919    0.081924  0.789  xx
176637       CAN00WH0     199803  0.0085    0.050415  0.1563  xxxx
179255       CAN00WH0     199804 -0.0168    0.018393  -0.64  xxxx
181880       CAN00WH0     199805  0.0427   -0.051279  -0.742  xxx
184516       CAN00WH0     199806 -0.0656   -0.011516  0.925  xxx

             id           period  ret_1m   mkt_ret_1m  beta
143275       CAN00WO0     199702 -0.1176   -0.025442  -1.52  xx
145755       CAN00WO0     199703 -0.0074   -0.049279  -0.632  xxx
148247       CAN00WO0     199704 -0.0075   -0.005948  1.521  xx
150775       CAN00WO0     199705  0.0451    0.122322  0.0321  xxx

推荐答案

我猜pd.rolling_apply在这种情况下无济于事,因为在我看来,它基本上只需要Series(即使已传递数据帧) ,它一次处理一列).但是,您始终可以编写自己的rolling_apply,它需要一个数据帧.

I guess pd.rolling_apply doesn't help in this case since it seems to me that it essentially only takes a Series (Even if a dataframe is passed, it's processing one column a time). But you can always write your own rolling_apply that takes a dataframe.

import pandas as pd
import numpy as np
from StringIO import StringIO

df = pd.read_csv(StringIO('''              id  period  ret_1m  mkt_ret_1m
131146  CAN00WG0  199609 -0.1538    0.047104
133530  CAN00WG0  199610 -0.0455   -0.014143
135913  CAN00WG0  199611  0.0000    0.040926
138334  CAN00WG0  199612  0.2952    0.008723
140794  CAN00WG0  199701 -0.0257    0.039916
143274  CAN00WG0  199702 -0.0038   -0.025442
145754  CAN00WG0  199703 -0.2992   -0.049279
148246  CAN00WG0  199704 -0.0919   -0.005948
150774  CAN00WG0  199705  0.0595    0.122322
153318  CAN00WG0  199706 -0.0337    0.045765
160980  CAN00WH0  199709  0.0757    0.079293
163569  CAN00WH0  199710 -0.0741   -0.044000
166159  CAN00WH0  199711  0.1000   -0.014644
168782  CAN00WH0  199712 -0.0909   -0.007072
171399  CAN00WH0  199801 -0.0100    0.001381
174022  CAN00WH0  199802  0.1919    0.081924
176637  CAN00WH0  199803  0.0085    0.050415
179255  CAN00WH0  199804 -0.0168    0.018393
181880  CAN00WH0  199805  0.0427   -0.051279
184516  CAN00WH0  199806 -0.0656   -0.011516
143275  CAN00WO0  199702 -0.1176   -0.025442
145755  CAN00WO0  199703 -0.0074   -0.049279
148247  CAN00WO0  199704 -0.0075   -0.005948
150775  CAN00WO0  199705  0.0451    0.122322'''), sep='\s+')



def calc_beta(df):
    np_array = df.values
    s = np_array[:,0] # stock returns are column zero from numpy array
    m = np_array[:,1] # market returns are column one from numpy array

    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
    return beta

def rolling_apply(df, period, func, min_periods=None):
    if min_periods is None:
        min_periods = period
    result = pd.Series(np.nan, index=df.index)

    for i in range(1, len(df)+1):
        sub_df = df.iloc[max(i-period, 0):i,:] #I edited here
        if len(sub_df) >= min_periods:
            idx = sub_df.index[-1]
            result[idx] = func(sub_df)
    return result

df['beta'] = np.nan
grp = df.groupby('id')
period = 6 #I'm using 6  to see some not NaN values, since sample data don't have longer than 12 groups
for stock, sub_df in grp:
    beta = rolling_apply(sub_df[['ret_1m','mkt_ret_1m']], period, calc_beta, min_periods = period)  
    beta.name = 'beta'
    df.update(beta)
print df

输出

            id  period  ret_1m  mkt_ret_1m      beta
131146  CAN00WG0  199609 -0.1538    0.047104       NaN
133530  CAN00WG0  199610 -0.0455   -0.014143       NaN
135913  CAN00WG0  199611  0.0000    0.040926       NaN
138334  CAN00WG0  199612  0.2952    0.008723       NaN
140794  CAN00WG0  199701 -0.0257    0.039916       NaN
143274  CAN00WG0  199702 -0.0038   -0.025442 -1.245908
145754  CAN00WG0  199703 -0.2992   -0.049279  2.574464
148246  CAN00WG0  199704 -0.0919   -0.005948  2.657887
150774  CAN00WG0  199705  0.0595    0.122322  1.371090
153318  CAN00WG0  199706 -0.0337    0.045765  1.494095
...          ...     ...     ...         ...       ...
171399  CAN00WH0  199801 -0.0100    0.001381       NaN
174022  CAN00WH0  199802  0.1919    0.081924  1.542782
176637  CAN00WH0  199803  0.0085    0.050415  1.605407
179255  CAN00WH0  199804 -0.0168    0.018393  1.571015
181880  CAN00WH0  199805  0.0427   -0.051279  1.139972
184516  CAN00WH0  199806 -0.0656   -0.011516  1.101890
143275  CAN00WO0  199702 -0.1176   -0.025442       NaN
145755  CAN00WO0  199703 -0.0074   -0.049279       NaN
148247  CAN00WO0  199704 -0.0075   -0.005948       NaN
150775  CAN00WO0  199705  0.0451    0.122322       NaN

这篇关于Python pandas 使用滚动矢量化方式将其应用于groupby对象来计算机车车辆的beta的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆