基于两列中的匹配值的日期差- pandas [英] Date Difference based on matching values in two columns - Pandas

查看:77
本文介绍了基于两列中的匹配值的日期差- pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我正在努力根据其他列创建一个列,我将分享示例数据的问题。

I have a dataframe, I am struggling to create a column based out of other columns, I will share the problem for a sample data.

          Date  Target1      Close
0   2018-05-25  198.0090    188.580002
1   2018-05-25  197.6835    188.580002
2   2018-05-25  198.0090    188.580002
3   2018-05-29  196.6230    187.899994
4   2018-05-29  196.9800    187.899994
5   2018-05-30  197.1375    187.500000
6   2018-05-30  196.6965    187.500000
7   2018-05-30  196.8750    187.500000
8   2018-05-31  196.2135    186.869995
9   2018-05-31  196.2135    186.869995
10  2018-05-31  196.5600    186.869995
11  2018-05-31  196.7700    186.869995
12  2018-05-31  196.9275    186.869995
13  2018-05-31  196.2135    186.869995
14  2018-05-31  196.2135    186.869995
15  2018-06-01  197.2845    190.240005
16  2018-06-01  197.2845    190.240005
17  2018-06-04  201.2325    191.830002
18  2018-06-04  201.4740    191.830002

我想创建另一列(针对每个观察)(例如,称为days_to_hit_target)这是接近点击(或越过特定日期的目标)的天数之差,然后计算天数之差并将其放在days_to_hit_target列中。

I want to create another column (for each observation) (called days_to_hit_target for example) which is the difference of days such that close hits (or crosses target of specific day), then it counts the difference of days and put them in the column days_to_hit_target.

想法是,假设今天2018年6月25日的收盘价为188.58,那么,我想获取这个目标(198.0090)达到收盘价的日期,该日期是在2018年6月4日晚些时候收盘价达到首次观察的目标(198.0090),该目标将被馈送到该列的第一次观察(days_to_hit_target)。

The idea is, suppose close price today in 2018-05-25 is 188.58, so, I want to get the date for which this target (198.0090) is hit close which it is doing somewhere later on 2018-06-04, where close has reached to the target of first observation, (198.0090), that will be fed to the first observation of the column (days_to_hit_target ).

推荐答案

import pandas as pd

csv = pd.read_csv(
    'sample.csv',
    parse_dates=['Date']
)

csv.sort_values('Date', inplace=True)

def find_closest(row):

    target = row['Target1']
    date = row['Date']

    matches = csv[
        (csv['Close'] >= target) &
        (csv['Date'] > date)
    ]

    closest_date = matches['Date'].iloc[0] if not matches.empty else None

    row['days to hit target'] = (closest_date - date).days if closest_date else None

    return row


final = csv.apply(find_closest, axis=1)

测试有点困难,因为没有一个目标出现在结尾处。但是这个想法很简单。设置原始框架的子帧,以使 date 在当前行日期之后,并且 Close 大于或等于 Target1 并获得第一个条目(这是在使用 df.sort_values 对它进行排序之后。

It's a bit hard to test because none of the targets appear in the close. But the idea is simple. Subset your original frame such that date is after the current row date and Close is greater than or equal to Target1 and get the first entry (this is after you've sorted it using df.sort_values.

如果子集为空,则使用None;否则,使用日期达到目标的天数当时非常简单。

If the subset is empty, use None. Otherwise, use the Date. Days to hit target is pretty simple at that point.

这篇关于基于两列中的匹配值的日期差- pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆