计算按人分组的 pandas 数据框中的重叠时间范围 [英] Count overlapping time frames in a pandas dataframe, grouped by person
问题描述
I'm using the top solution here to determine the number of rows that have start and end times overlapping with the given row. However, I need these overlaps to be determined by groups and not across the whole dataframe.
我正在使用的数据包含对话的开始和结束时间以及相关人员的姓名:
The data I'm working with has start and end times for conversations and the name of the person involved:
id start_time end_time name
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally
这是上一篇文章的解决方案:
This is the solution from the previous post:
ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
d['overlap'] = (ends & starts).sum(0)
df
但是此记录在对话3和4之间有重叠,而我只是在寻找1-3或4-5之间的重叠.
But this records overlap between conversations 3 and 4, whereas I'm only looking for overlap between 1 - 3 or between 4 - 5.
我现在得到的是:
id start_time end_time name overlap
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob 2
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob 1
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob 1
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally 1
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally 0
我想要得到的东西:
id start_time end_time name overlap
1 2021-02-10 10:37:35 2021-02-10 12:16:22 Bob 2
2 2021-02-10 11:09:39 2021-02-10 13:06:25 Bob 1
3 2021-02-10 12:10:33 2021-02-10 17:06:26 Bob 0
4 2021-02-10 15:05:08 2021-02-10 21:07:05 Sally 1
5 2021-02-10 21:07:26 2021-02-10 21:26:37 Sally 0
推荐答案
我认为这可能会满足您的需求.
I think this might give what you need.
添加一个额外的&姓名匹配的条件:
Add in an extra & condition for matching on name too:
ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
same_group = (df['name'].values == df['name'].values[:, None])
# sum across axis=1 !!!
df['overlap'] = (ends & starts & same_group).sum(1)
df
这篇关于计算按人分组的 pandas 数据框中的重叠时间范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!