查询日期和价格数据的矢量化方法 [英] Vectorised way to query date and price data
问题描述
问题:
概述:
我正在寻找一种矢量化的方法来获取第一次看到某个特定条件的日期.当dfDays
中的价格为>
在dfWeeks.target
中指定的目标价格时,将找到条件.必须在设置目标之后才能达到此条件.
I'm looking for a vectorised way to get the first date that a certain condition is seen. The condition is found when the price in dfDays
is >
the target price specified in dfWeeks.target
. This condition has to be hit after the date the target was set.
在熊猫中,有没有一种方法可以通过矢量化的方式使用apply
或类似方法进行以下时间序列分析?
Is there a way to do the following time series analysis, with apply
or similar, in a vectorised way in Pandas?
数据:
生成freq='D'
测试数据帧
np.random.seed(seed=1)
rng = pd.date_range('1/1/2000', '2000-07-31',freq='D')
weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),))
ts2 = pd.Series(weeks
,index=rng)
dfDays = pd.DataFrame({'price':ts2})
现在创建一个重新采样的freq='1W-Mon'
数据框
Now create a resampled freq='1W-Mon'
dataframe
dfWeeks = dfDays.resample('1W-Mon').first()
dfWeeks['target'] = (dfWeeks['price'] + .5).round(2)
使用reindex
对齐两个df上的索引:
Use reindex
to align index on both df:
dfWeeks = dfWeeks.reindex(dfDays.index)
所以dfWeeks
是一个包含我们将使用的目标值的数据框
So dfWeeks
is a dataframe containing the target values we will use
dfWeeks.dropna().head()
price target
2000-01-03 1.851533 2.35
2000-01-10 1.625595 2.13
2000-01-17 1.855813 2.36
2000-01-24 2.130619 2.63
2000-01-31 2.756487 3.26
如果我们专注于dfWeeks
match = dfDays[dfDays.price >= dfWeeks.target.loc['2000-01-03']]
第一个匹配项是过去的,因此无效,因此2000-01-12
条目是第一个有效匹配项:
The first match is in the past so not valid so the 2000-01-12
entry is the first valid match:
match.head()
price
2000-01-02 2.449039
2000-01-12 2.379882
2000-01-14 2.759891
2000-01-16 2.350821
2000-01-21 2.607467
是否可以通过矢量化的方式对dfWeeks
中的target
条目的apply
或类似内容执行此操作?
Is there a way to do this with apply
or similar for target
entries in dfWeeks
in a vectorised way?
所需的输出:
price target target_hit
2000-01-03 1.851533 2.35 2000-01-12
2000-01-10 1.625595 2.13 2000-01-12
2000-01-17 1.855813 2.36 2000-01-21
2000-01-24 2.130619 2.63 2000-01-25
2000-01-31 2.756487 3.26 nan
推荐答案
pandas
def find_match(x):
match = dfDays.query('index > @x.name & price >= @x.target')
if not match.empty:
return match.index[0]
dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))
numpy
timing = dfWeeks.index.values[:, None] < dfDays.index.values
target_hit = dfWeeks.target.values[:, None] <= dfDays.price.values
matches = timing & target_hit
got_match = matches.any(1)
first = matches.argmax(1)[got_match]
dfWeeks.loc[got_match, 'target_hit'] = dfDays.index.values[first]
dfWeeks
两者均为产量
both yield
price target target_hit
2000-01-03 1.851533 2.35 2000-01-12
2000-01-10 1.625595 2.13 2000-01-12
2000-01-17 1.855813 2.36 2000-01-21
2000-01-24 2.130619 2.63 2000-01-25
2000-01-31 2.756487 3.26 NaT
2000-02-07 1.859582 2.36 2000-02-09
2000-02-14 1.066028 1.57 2000-02-15
2000-02-21 1.912350 2.41 2000-03-09
2000-02-28 1.446907 1.95 2000-02-29
2000-03-06 2.408524 2.91 2000-03-28
2000-03-13 2.337675 2.84 2000-03-17
2000-03-20 2.620561 3.12 NaT
2000-03-27 2.770113 3.27 NaT
2000-04-03 2.930735 3.43 NaT
2000-04-10 1.834030 2.33 2000-04-12
2000-04-17 2.068304 2.57 2000-04-19
2000-04-24 2.391067 2.89 2000-05-11
2000-05-01 2.518262 3.02 NaT
2000-05-08 1.085764 1.59 2000-05-10
2000-05-15 1.579992 2.08 2000-05-16
2000-05-22 2.619997 3.12 NaT
2000-05-29 1.269047 1.77 2000-05-31
2000-06-05 1.171789 1.67 2000-06-06
2000-06-12 2.175277 2.68 2000-06-20
2000-06-19 1.338879 1.84 2000-06-20
2000-06-26 2.977574 3.48 NaT
2000-07-03 1.160680 1.66 2000-07-04
2000-07-10 2.615366 3.12 NaT
2000-07-17 2.478080 2.98 NaT
2000-07-24 2.899562 3.40 NaT
2000-07-31 2.220492 2.72 NaT
这篇关于查询日期和价格数据的矢量化方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!