高效地计算 pandas 的控制点 [英] Efficiently calculating point of control with pandas

查看:74
本文介绍了高效地计算 pandas 的控制点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在每天的时间范围内实施此功能时,我的算法的运行时间从35秒提高到15分钟.该算法批量获取每日历史记录,并遍历数据帧的子集(从t0到tX,其中tX是迭代的当前行).它这样做是为了模拟算法实时操作期间会发生的情况.我知道可以通过在帧计算之间利用内存来改善它,但是我想知道是否还有更多的熊猫式实现会立即受益.

My algorithm stepped up from 35 seconds to 15 minutes runtime when implementing this feature over a daily timeframe. The algo retrieves daily history in bulk and iterates over a subset of the dataframe (from t0 to tX where tX is the current row of iteration). It does this to emulate what would happen during the real time operations of the algo. I know there are ways of improving it by utilizing memory between frame calculations but I was wondering if there was a more pandas-ish implementation that would see immediate benefit.

假设 self.Step 类似于 0.00001 self.Precision 5 ;它们用于将Ohlc条信息分类为离散步骤,以便找到poc. _frame 是整个数据帧的行的子集,而 _low / _high 分别对应于该行.下面的代码块在整个 _frame 上执行,每次算法添加新行时(在按每日数据计算年度时间框架时),该代码行可能多达〜250行.我认为正是这种原因导致了主要的增长放缓.数据框具有以下列,例如 high low open close volume .我正在计算时间价格机会和控制量点.

Assume that self.Step is something like 0.00001 and self.Precision is 5; they are used for binning the ohlc bar information into discrete steps for the sake of finding the poc. _frame is a subset of rows of the entire dataframe, and _low/_high are respective to that. The following block of code executes on the entire _frame which could be upwards of ~250 rows every time there is a new row added by the algo (when calculating yearly timeframe on daily data). I believe it's the iterrows that's causing the major slowdown. The dataframe has columns such as high, low, open, close, volume. I am calculating time price opportunity and volume point of control.

# Set the complete index of prices +/- 1 step due to weird floating point precision issues
volume_prices = pd.Series(0, index=np.around(np.arange(_low - self.Step, _high + self.Step, self.Step), decimals=self.Precision))
time_prices = volume_prices.copy()
for index, state in _frame.iterrows():
    _prices = np.around(np.arange(state.low, state.high, self.Step), decimals=self.Precision)
    # Evenly distribute the bar's volume over its range
    volume_prices[_prices] += state.volume / _prices.size
    # Increment time at price
    time_prices[_prices] += 1
# Pandas only returns the 1st row of the max value,
# so we need to reverse the series to find the other side
# and then find the average price between those two extremes
volume_poc = (volume_prices.idxmax() + volume_prices.iloc[::-1].idxmax()) / 2)
time_poc = (time_prices.idxmax() + time_prices.iloc[::-1].idxmax()) / 2)

推荐答案

无论如何,我已经设法将其缩减为2分钟而不是15分钟-无论如何在每天的时间范围内.在较短的时间范围内(2年内,每小时10分钟,股票的精度为2),它仍然很慢.使用DataFrames而不是Series的速度要慢一些.我希望得到更多,但除了以下解决方案外,我不知道还能做什么:

I've managed to get it down to 2 mins instead of 15 - at on daily timeframes anyway. It's still slow on lower timeframes (10 minutes on Hourly over a 2 year period with a precision of 2 for equities). Working with DataFrames as opposed to Series was FAR slower. I'm hoping for more but I don't know what I can do aside from the following solution:

# Upon class instantiation, I've created attributes for each timeframe
# related to `volume_at_price` and `time_at_price`. They serve as memory
# in between frame calculations
def _prices_at(self, frame, bars=0):
    # Include 1 step above high as np.arange does not
    # include the upper limit by default
    state = frame.iloc[-min(bars + 1, frame.index.size)]
    bins = np.around(np.arange(state.low, state.high + self.Step, self.Step), decimals=self.Precision)
    return pd.Series(state.volume / bins.size, index=bins)


# SetFeature/Feature implement timeframed attributes (i.e., 'volume_at_price_D')
_v = 'volume_at_price'
_t = 'time_at_price'

# Add to x_at_price histogram
_p = self._prices_at(frame)
self.SetFeature(_v, self.Feature(_v).add(_p, fill_value=0))
self.SetFeature(_t, self.Feature(_t).add(_p * 0 + 1, fill_value=0))

# Remove old data from histogram
_p = self._prices_at(frame, self.Bars)
v = self.SetFeature(_v, self.Feature(_v).subtract(_p, fill_value=0))
t = self.SetFeature(_t, self.Feature(_t).subtract(_p * 0 + 1, fill_value=0))

self.SetFeature('volume_poc', (v.idxmax() + v.iloc[::-1].idxmax()) / 2)
self.SetFeature('time_poc', (t.idxmax() + t.iloc[::-1].idxmax()) / 2)

这篇关于高效地计算 pandas 的控制点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆