基于两列中的匹配值的日期差- pandas [英] Date Difference based on matching values in two columns - Pandas
问题描述
我有一个数据框,我正在努力根据其他列创建一个列,我将分享示例数据的问题。
I have a dataframe, I am struggling to create a column based out of other columns, I will share the problem for a sample data.
Date Target1 Close
0 2018-05-25 198.0090 188.580002
1 2018-05-25 197.6835 188.580002
2 2018-05-25 198.0090 188.580002
3 2018-05-29 196.6230 187.899994
4 2018-05-29 196.9800 187.899994
5 2018-05-30 197.1375 187.500000
6 2018-05-30 196.6965 187.500000
7 2018-05-30 196.8750 187.500000
8 2018-05-31 196.2135 186.869995
9 2018-05-31 196.2135 186.869995
10 2018-05-31 196.5600 186.869995
11 2018-05-31 196.7700 186.869995
12 2018-05-31 196.9275 186.869995
13 2018-05-31 196.2135 186.869995
14 2018-05-31 196.2135 186.869995
15 2018-06-01 197.2845 190.240005
16 2018-06-01 197.2845 190.240005
17 2018-06-04 201.2325 191.830002
18 2018-06-04 201.4740 191.830002
我想创建另一列(针对每个观察)(例如,称为days_to_hit_target)这是接近点击(或越过特定日期的目标)的天数之差,然后计算天数之差并将其放在days_to_hit_target列中。
I want to create another column (for each observation) (called days_to_hit_target for example) which is the difference of days such that close hits (or crosses target of specific day), then it counts the difference of days and put them in the column days_to_hit_target.
想法是,假设今天2018年6月25日的收盘价为188.58,那么,我想获取这个目标(198.0090)达到收盘价的日期,该日期是在2018年6月4日晚些时候收盘价达到首次观察的目标(198.0090),该目标将被馈送到该列的第一次观察(days_to_hit_target)。
The idea is, suppose close price today in 2018-05-25 is 188.58, so, I want to get the date for which this target (198.0090) is hit close which it is doing somewhere later on 2018-06-04, where close has reached to the target of first observation, (198.0090), that will be fed to the first observation of the column (days_to_hit_target ).
推荐答案
import pandas as pd
csv = pd.read_csv(
'sample.csv',
parse_dates=['Date']
)
csv.sort_values('Date', inplace=True)
def find_closest(row):
target = row['Target1']
date = row['Date']
matches = csv[
(csv['Close'] >= target) &
(csv['Date'] > date)
]
closest_date = matches['Date'].iloc[0] if not matches.empty else None
row['days to hit target'] = (closest_date - date).days if closest_date else None
return row
final = csv.apply(find_closest, axis=1)
测试有点困难,因为没有一个目标出现在结尾处。但是这个想法很简单。设置原始框架的子帧,以使 date
在当前行日期之后,并且 Close
大于或等于 Target1
并获得第一个条目(这是在使用 df.sort_values
对它进行排序之后。
It's a bit hard to test because none of the targets appear in the close. But the idea is simple. Subset your original frame such that date
is after the current row date and Close
is greater than or equal to Target1
and get the first entry (this is after you've sorted it using df.sort_values
.
如果子集为空,则使用None;否则,使用日期
。达到目标的天数
当时非常简单。
If the subset is empty, use None. Otherwise, use the Date
. Days to hit target
is pretty simple at that point.
这篇关于基于两列中的匹配值的日期差- pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!