高效的Python pandas 股票Beta计算许多数据帧 [英] Efficient Python Pandas Stock Beta Calculation on Many Dataframes

查看：1066 发布时间：2017/3/26 0:02:21 python algorithm performance pandas dataframe

本文介绍了高效的Python pandas 股票Beta计算许多数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有很多（4000+）以上的CSV数据（日期，开放，高，低，关闭），我导入到各个熊猫数据帧中进行分析。我是新来的python，想要计算一个滚动12个月的每个股票的beta，我发现一个帖子来计算滚动测试版（

滚动功能 br>
返回groupby对象准备应用自定义函数

请参阅

验证

与OP比较计算

  def calc_beta（df）：
 np_array = df.values 
m = np_array [：，0]＃市场回报是从numpy数组中的列零
s = np_array [：，1]＃股票返回是从numpy数组
 covariance = np.cov（s，m）＃计算股票和市场之间的协方差
 beta =协方差[0,1] /协方差[1,1] 
 return beta 
 < code >

 
 
 
  print（calc_beta（df.iloc [ ：2]））
 
 -0.311757542437 
  
 
 
 
 
 
 
  print（beta（df.iloc [：12，：2]））
 
 s0001 -0.311758 
名称：Beta，dtype ：float64 
  
 
 
 
 
 
   请注意第一个单元格  
 
与上述验证计算相同的值
  betas = rdf.apply（beta）
 betas.iloc [：5，：5] 
  
  
 
 
 
 
 
   回应评论  
 
模拟多个数据框架的全工作示例
  num_sec_dfs = 4000 
 
 cols = ['Open'，'High'，'Low'，'Close'] 
 dfs = {'s {：04d}'format（i）：pd.DataFrame（np.random 。$（$）
 
 market = pd.Series（np.random.rand（480），dates，name ='Market' ）
 
 df = pd.concat（[market] + [dfs [k] .close.rename（k）for dfs.keys（）]，axis = 1）.sort_index（1） 
 
 betas = roll（df.pct_change（）。dropna（），12）.apply（beta）
 
 for c，col in betas.iteritems（）：
 dfs [c] ['Beta'] = col 
 
 dfs ['s0001']。head（20）
  
  
 
I have many (4000+) CSVs of stock data (Date, Open, High, Low, Close) which I import into individual Pandas dataframes to perform analysis.  I am new to python and want to calculate a rolling 12month beta for each stock, I found a post to calculate rolling beta (Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion) however when used in my code below takes over 2.5 hours!  Considering I can run the exact same calculations in SQL tables in under 3 minutes this is too slow.

How can I improve the performance of my below code to match that of SQL?  I understand Pandas/python has that capability. My current method loops over each row which I know slows performance but I am unaware of any aggregate way to perform a rolling window beta calculation on a dataframe.

Note: the first 2 steps of loading the CSVs into individual dataframes and calculating daily returns only takes ~20seconds.  All my CSV dataframes are stored in the dictionary called 'FilesLoaded' with names such as 'XAO'.

Your help would be much appreciated!
Thank you :)
import pandas as pd, numpy as np
import datetime
import ntpath
pd.set_option('precision',10)  #Set the Decimal Point precision to DISPLAY
start_time=datetime.datetime.now()

MarketIndex = 'XAO'
period = 250
MinBetaPeriod = period
# ***********************************************************************************************
# CALC RETURNS 
# ***********************************************************************************************
for File in FilesLoaded:
    FilesLoaded[File]['Return'] = FilesLoaded[File]['Close'].pct_change()
# ***********************************************************************************************
# CALC BETA
# ***********************************************************************************************
def calc_beta(df):
    np_array = df.values
    m = np_array[:,0] # market returns are column zero from numpy array
    s = np_array[:,1] # stock returns are column one from numpy array
    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
    return beta

#Build Custom "Rolling_Apply" function
def rolling_apply(df, period, func, min_periods=None):
    if min_periods is None:
        min_periods = period
    result = pd.Series(np.nan, index=df.index)
    for i in range(1, len(df)+1):
        sub_df = df.iloc[max(i-period, 0):i,:]
        if len(sub_df) >= min_periods:  
            idx = sub_df.index[-1]
            result[idx] = func(sub_df)
    return result

#Create empty BETA dataframe with same index as RETURNS dataframe
df_join = pd.DataFrame(index=FilesLoaded[MarketIndex].index)    
df_join['market'] = FilesLoaded[MarketIndex]['Return']
df_join['stock'] = np.nan

for File in FilesLoaded:
    df_join['stock'].update(FilesLoaded[File]['Return'])
    df_join  = df_join.replace(np.inf, np.nan) #get rid of infinite values "inf" (SQL won't take "Inf")
    df_join  = df_join.replace(-np.inf, np.nan)#get rid of infinite values "inf" (SQL won't take "Inf")
    df_join  = df_join.fillna(0) #get rid of the NaNs in the return data
    FilesLoaded[File]['Beta'] = rolling_apply(df_join[['market','stock']], period, calc_beta, min_periods = MinBetaPeriod)

# ***********************************************************************************************
# CLEAN-UP
# ***********************************************************************************************
print('Run-time: {0}'.format(datetime.datetime.now() - start_time))

 解决方案 
Generate Random Stock Data

20 Years of Monthly Data for 4,000 Stocks
dates = pd.date_range('1995-12-31', periods=480, freq='M', name='Date')
stoks = pd.Index(['s{:04d}'.format(i) for i in range(4000)])
df = pd.DataFrame(np.random.rand(480, 4000), dates, stoks)




df.iloc[:5, :5]




Roll Function

Returns groupby object ready to apply custom functions

See Source 
def roll(df, w):
    # stack df.values w-times shifted once at each stack
    roll_array = np.dstack([df.values[i:i+w, :] for i in range(len(df.index) - w + 1)]).T
    # roll_array is now a 3-D array and can be read into
    # a pandas panel object
    panel = pd.Panel(roll_array, 
                     items=df.index[w-1:],
                     major_axis=df.columns,
                     minor_axis=pd.Index(range(w), name='roll'))
    # convert to dataframe and pivot + groupby
    # is now ready for any action normally performed
    # on a groupby object
    return panel.to_frame().unstack().T.groupby(level=0)




Beta Function

Use closed form solution of OLS regression

Assume column 0 is market

See Source
def beta(df):
    # first column is the market
    X = df.values[:, [0]]
    # prepend a column of ones for the intercept
    X = np.concatenate([np.ones_like(X), X], axis=1)
    # matrix algebra
    b = np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(df.values[:, 1:])
    return pd.Series(b[1], df.columns[1:], name='Beta')




Demonstration
rdf = roll(df, 12)
betas = rdf.apply(beta)




Timing





Validation

Compare calculations with OP
def calc_beta(df):
    np_array = df.values
    m = np_array[:,0] # market returns are column zero from numpy array
    s = np_array[:,1] # stock returns are column one from numpy array
    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
    return beta




print(calc_beta(df.iloc[:12, :2]))

-0.311757542437




print(beta(df.iloc[:12, :2]))

s0001   -0.311758
Name: Beta, dtype: float64




Note the first cell

Is the same value as validated calculations above
betas = rdf.apply(beta)
betas.iloc[:5, :5]




Response to comment

Full working example with simulated multiple dataframes
num_sec_dfs = 4000

cols = ['Open', 'High', 'Low', 'Close']
dfs = {'s{:04d}'.format(i): pd.DataFrame(np.random.rand(480, 4), dates, cols) for i in range(num_sec_dfs)}

market = pd.Series(np.random.rand(480), dates, name='Market')

df = pd.concat([market] + [dfs[k].Close.rename(k) for k in dfs.keys()], axis=1).sort_index(1)

betas = roll(df.pct_change().dropna(), 12).apply(beta)

for c, col in betas.iteritems():
    dfs[c]['Beta'] = col

dfs['s0001'].head(20)


                        这篇关于高效的Python pandas 股票Beta计算许多数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

高效的Python pandas 股票Beta计算许多数据帧 [英] Efficient Python Pandas Stock Beta Calculation on Many Dataframes

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

高效的Python pandas 股票Beta计算许多数据帧 [英] Efficient Python Pandas Stock Beta Calculation on Many Dataframes

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭