Pandas Dataframe-对于每一行,返回日期重叠的其他行的计数 [英] Pandas Dataframe - for each row, return count of other rows with overlapping dates
本文介绍了Pandas Dataframe-对于每一行,返回日期重叠的其他行的计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个包含项目,开始日期和结束日期的数据框.对于每一行,我想返回项目开始时正在处理的其他项目的数量.使用 df.apply()
时如何嵌套循环?我尝试过使用for循环,但是我的数据帧很大,并且花费的时间太长.
I've got a dataframe with projects, start dates, and end dates. For each row I would like to return the number of other projects in process when the project started. How do you nest loops when using df.apply()
? I've tried using a for loop but my dataframe is large and it takes way too long.
import datetime as dt
data = {'project' :['A', 'B', 'C'],
'pr_start_date':[dt.datetime(2018, 9, 1), dt.datetime(2019, 4, 1), dt.datetime(2019, 6, 8)],
'pr_end_date': [dt.datetime(2019, 6, 15), dt.datetime(2019, 12, 1), dt.datetime(2019, 8, 1)]}
df = pd.DataFrame(data)
def cons_overlap(start):
overlaps = 0
for i in df.index:
other_start = df.loc[i, 'pr_start_date']
other_end = df.loc[i, 'pr_end_date']
if (start > other_start) & (start < other_end):
overlaps += 1
return overlaps
df['overlap'] = df.apply(lambda row: cons_overlap(row['pr_start_date']), axis=1)
这是我正在寻找的输出:
This is the output I'm looking for:
pr pr_start_date pr_end_date overlap
0 A 2018-09-01 2019-06-15 0
1 B 2019-04-01 2019-12-01 1
2 C 2019-06-08 2019-08-01 2
推荐答案
我建议您利用 输出
project pr_start_date pr_end_date overlap
0 A 2018-09-01 2019-06-15 0
1 B 2019-04-01 2019-12-01 1
2 C 2019-06-08 2019-08-01 2
开头和结尾都是3x3的矩阵,当满足条件时,它们就是真值:
Both ends and starts are matrices of 3x3 that are truth when the condition is met:
# ends
[[ True True True]
[ True True True]
[ True True True]]
# starts
[[False True True]
[False False True]
[False False False]]
然后找到与逻辑&
的交点并跨列求和( sum(0)
).
Then find the intersection with the logical &
and sum across columns (sum(0)
).
这篇关于Pandas Dataframe-对于每一行,返回日期重叠的其他行的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文