用Python计算复合收益系列 [英] Compute a compounded return series in Python

查看:355
本文介绍了用Python计算复合收益系列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问候,我有两个系列的数据:每日原始股票价格回报(正或负浮动)和交易信号(买入= 1,卖出= -1,无交易= 0).

Greetings all, I have two series of data: daily raw stock price returns (positive or negative floats) and trade signals (buy=1, sell=-1, no trade=0).

原始价格回报只是今天价格除以昨天价格的对数:

The raw price returns are simply the log of today's price divided by yesterday's price:

log(p_today / p_yesterday)

一个例子:

raw_return_series = [ 0.0063 -0.0031 0.0024 ..., -0.0221 0.0097 -0.0015]

交易信号系列如下:

signal_series = [-1. 0. -1. -1. 0. 0. -1. 0. 0. 0.]

要根据交易信号获取每日收益:

To get the daily returns based on the trade signals:

daily_returns = [raw_return_series[i] * signal_series[i+1] for i in range(0, len(signal_series)-1)]

这些每日收益可能看起来像这样:

These daily returns might look like this:

[0.0, 0.00316, -0.0024, 0.0, 0.0, 0.0023, 0.0, 0.0, 0.0] # results in daily_returns; notice the 0s

我需要使用daily_returns系列来计算复合收益系列.但是,鉴于daily_returns系列中有0个值,我需要将通过时间"的最后一个非零复合返回值延续到下一个非零复合返回值.

I need to use the daily_returns series to compute a compounded returns series. However, given that there are 0 values in the daily_returns series, I need to carry over the last non-zero compound return "through time" to the next non-zero compound return.

例如,我计算出这样的复合收益(注意,我将随着时间向后"移动):

For example, I compute the compound returns like this (notice I am going "backwards" through time):

compound_returns = [(((1 + compounded[i + 1]) * (1 + daily_returns[i])) - 1) for i in range(len(compounded) - 2, -1, -1)]

和结果列表:

[0.0, 0.0, 0.0023, 0.0, 0.0, -0.0024, 0.0031, 0.0] # (notice the 0s)

我的目标是将最后的非零收益结转至累积这些复合收益.也就是说,由于索引i的收益取决于索引i + 1的收益,因此索引i + 1的收益应为非零.每当列表理解在daily_return系列中遇到零时,它实际上都会重新启动.

My goal is to carry over the last non-zero return to the accumulate these compound returns. That is, since the return at index i is dependent on the return at index i+1, the return at index i+1 should be non-zero. Every time the list comprehension encounters a zero in the daily_return series, it essentially restarts.

推荐答案

有一个很棒的模块,叫做 pandas 这是由AQR(对冲基金)的一个人写的,擅长于这种计算...您需要的是一种处理丢失数据"的方法...如上所述,基础知识使用的是nan(不是scipy或numpy的功能;但是,即使是那些库也无法使财务计算变得容易得多……如果您使用熊猫,则可以将不想考虑的数据标记为nan,然后任何将来的计算都会在执行时拒绝它正常处理其他数据.

There is a fantastic module called pandas that was written by a guy at AQR (a hedge fund) that excels at calculations like this... what you need is a way to handle "missing data"... as someone mentioned above, the basics are using the nan (not a number) capabilities of scipy or numpy; however, even those libraries don't make financial calculations that much easier... if you use pandas, you can mark the data you don't want to consider as nan, and then any future calculations will reject it, while performing normal operations on other data.

我已经在交易平台上使用 pandas 了...大约8个月了...我希望我已经开始早点使用它.

I have been using pandas on my trading platform for about 8 months... I wish I had started using it sooner.

Wes(作者)在pyCon 2010上就该模块的功能进行了演讲...请参阅幻灯片和视频

Wes (the author) gave a talk at pyCon 2010 about the capabilities of the module... see the slides and video on the pyCon 2010 webpage. In that video, he demonstrates how to get daily returns, run 1000s of linear regressions on a matrix of returns (in a fraction of a second), timestamp / graph data... all done with this module. Combined with psyco, this is a beast of a financial analysis tool.

它处理的另一件好事是横截面数据...因此您可以获取每日收盘价,滚动价格等信息,然后为每次计算时间戳,并保存所有这些信息用类似于python字典的方式(请参见pandas.DataFrame类)...然后,您可以像访问以下那样简单地访问数据切片:

The other great thing it handles is cross-sectional data... so you could grab daily close prices, their rolling means, etc... then timestamp every calculation, and get all this stored in something similar to a python dictionary (see the pandas.DataFrame class)... then you access slices of the data as simply as:

close_prices['stdev_5d']

有关计算滚动标准差的更多信息,请参见熊猫滚动时刻文档.一线).

See the pandas rolling moments doc for more information on to calculate the rolling stdev (it's a one-liner).

Wes竭尽全力用cython加速模块,尽管我承认由于我的分析要求,我正在考虑升级服务器(较旧的Xeon).

Wes has gone out of his way to speed the module up with cython, although I'll concede that I'm considering upgrading my server (an older Xeon), due to my analysis requirements.

编辑STRIMP的问题: 在将代码转换为使用pandas数据结构后,我仍然不清楚如何在pandas数据框中建立数据索引以及复合函数对处理丢失数据的要求(或就此而言,要求返回0.0的条件...或者您在熊猫中使用NaN.).我将演示如何使用数据索引...随机选择了一天... df是其中包含ES期货报价的数据框...每秒索引...缺少的报价用numpy.nan填充. DataFrame索引是datetime对象,由pytz模块的时区对象偏移.

EDIT FOR STRIMP's QUESTION: After you converted your code to use pandas data structures, it's still unclear to me how you're indexing your data in a pandas dataframe and the compounding function's requirements for handling missing data (or for that matter the conditions for a 0.0 return... or if you are using NaN in pandas..). I will demonstrate using my data indexing... a day was picked at random... df is a dataframe with ES Futures quotes in it... indexed per second... missing quotes are filled in with numpy.nan. DataFrame indexes are datetime objects, offset by the pytz module's timezone objects.

>>> df.info
<bound method DataFrame.info of <class 'pandas.core.frame.DataFrame'>
Index: 86400 entries , 2011-03-21 00:00:00-04:00 to 2011-03-21 23:59:59-04:00
etf                                         18390  non-null values
etfvol                                      18390  non-null values
fut                                         29446  non-null values
futvol                                      23446  non-null values
...
>>> # ET is a pytz object...
>>> et
<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
>>> # To get the futures quote at 9:45, eastern time...
>>> df.xs(et.localize(dt.datetime(2011,3,21,9,45,0)))['fut']
1291.75
>>>

要给出一个简单的示例,说明如何计算连续收益列(在pandas.TimeSeries中),该列引用10分钟前的报价(并填写缺失的报价),我将这样做:

To give a simple example of how to calculate a column of continuous returns (in a pandas.TimeSeries), which reference the quote 10 minutes ago (and filling in for missing ticks), I would do this:

>>> df['fut'].fill(method='pad')/df['fut'].fill(method='pad').shift(600)

在这种情况下,不需要lambda,只需将值列本身除以600秒即可. .shift(600)部分是因为我的数据是每秒索引的.

No lambda is required in that case, just dividing the column of values by itself 600 seconds ago. That .shift(600) part is because my data is indexed per-second.

HTH, \ mike

HTH, \mike

这篇关于用Python计算复合收益系列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆