查询日期和价格数据的矢量化方法 [英] Vectorised way to query date and price data

查看:103
本文介绍了查询日期和价格数据的矢量化方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

概述:

我正在寻找一种矢量化的方法来获取第一次看到某个特定条件的日期.当dfDays中的价格为>dfWeeks.target中指定的目标价格时,将找到条件.必须在设置目标之后才能达到此条件.

I'm looking for a vectorised way to get the first date that a certain condition is seen. The condition is found when the price in dfDays is > the target price specified in dfWeeks.target. This condition has to be hit after the date the target was set.

在熊猫中,有没有一种方法可以通过矢量化的方式使用apply或类似方法进行以下时间序列分析?

Is there a way to do the following time series analysis, with apply or similar, in a vectorised way in Pandas?

数据:

生成freq='D'测试数据帧

np.random.seed(seed=1)
rng = pd.date_range('1/1/2000', '2000-07-31',freq='D')
weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),))
ts2 = pd.Series(weeks
               ,index=rng)
dfDays = pd.DataFrame({'price':ts2})

现在创建一个重新采样的freq='1W-Mon'数据框

Now create a resampled freq='1W-Mon' dataframe

dfWeeks = dfDays.resample('1W-Mon').first()
dfWeeks['target'] = (dfWeeks['price'] + .5).round(2)

使用reindex对齐两个df上的索引:

Use reindex to align index on both df:

dfWeeks = dfWeeks.reindex(dfDays.index)

所以dfWeeks是一个包含我们将使用的目标值的数据框

So dfWeeks is a dataframe containing the target values we will use

    dfWeeks.dropna().head()

               price    target
2000-01-03  1.851533    2.35
2000-01-10  1.625595    2.13
2000-01-17  1.855813    2.36
2000-01-24  2.130619    2.63
2000-01-31  2.756487    3.26

如果我们专注于dfWeeks

match = dfDays[dfDays.price >=  dfWeeks.target.loc['2000-01-03']]

第一个匹配项是过去的,因此无效,因此2000-01-12条目是第一个有效匹配项:

The first match is in the past so not valid so the 2000-01-12 entry is the first valid match:

match.head()

            price
2000-01-02  2.449039
2000-01-12  2.379882
2000-01-14  2.759891
2000-01-16  2.350821
2000-01-21  2.607467

是否可以通过矢量化的方式对dfWeeks中的target条目的apply或类似内容执行此操作?

Is there a way to do this with apply or similar for target entries in dfWeeks in a vectorised way?

所需的输出:

               price    target  target_hit
2000-01-03  1.851533    2.35    2000-01-12
2000-01-10  1.625595    2.13    2000-01-12 
2000-01-17  1.855813    2.36    2000-01-21
2000-01-24  2.130619    2.63    2000-01-25
2000-01-31  2.756487    3.26    nan

推荐答案

pandas

def find_match(x):
    match = dfDays.query('index > @x.name & price >= @x.target')
    if not match.empty:
        return match.index[0]

dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))

numpy

timing = dfWeeks.index.values[:, None] < dfDays.index.values
target_hit = dfWeeks.target.values[:, None] <= dfDays.price.values
matches = timing & target_hit
got_match = matches.any(1)
first = matches.argmax(1)[got_match]

dfWeeks.loc[got_match, 'target_hit'] = dfDays.index.values[first]
dfWeeks


两者均为产量


both yield

               price  target target_hit
2000-01-03  1.851533    2.35 2000-01-12
2000-01-10  1.625595    2.13 2000-01-12
2000-01-17  1.855813    2.36 2000-01-21
2000-01-24  2.130619    2.63 2000-01-25
2000-01-31  2.756487    3.26        NaT
2000-02-07  1.859582    2.36 2000-02-09
2000-02-14  1.066028    1.57 2000-02-15
2000-02-21  1.912350    2.41 2000-03-09
2000-02-28  1.446907    1.95 2000-02-29
2000-03-06  2.408524    2.91 2000-03-28
2000-03-13  2.337675    2.84 2000-03-17
2000-03-20  2.620561    3.12        NaT
2000-03-27  2.770113    3.27        NaT
2000-04-03  2.930735    3.43        NaT
2000-04-10  1.834030    2.33 2000-04-12
2000-04-17  2.068304    2.57 2000-04-19
2000-04-24  2.391067    2.89 2000-05-11
2000-05-01  2.518262    3.02        NaT
2000-05-08  1.085764    1.59 2000-05-10
2000-05-15  1.579992    2.08 2000-05-16
2000-05-22  2.619997    3.12        NaT
2000-05-29  1.269047    1.77 2000-05-31
2000-06-05  1.171789    1.67 2000-06-06
2000-06-12  2.175277    2.68 2000-06-20
2000-06-19  1.338879    1.84 2000-06-20
2000-06-26  2.977574    3.48        NaT
2000-07-03  1.160680    1.66 2000-07-04
2000-07-10  2.615366    3.12        NaT
2000-07-17  2.478080    2.98        NaT
2000-07-24  2.899562    3.40        NaT
2000-07-31  2.220492    2.72        NaT

这篇关于查询日期和价格数据的矢量化方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆