查找组中个人的开始时间和结束时间之间的重叠时间 [英] finding overlapping time between start time and end time of individuals in a group
问题描述
我有
household person start time end time
1 1 07:45:00 21:45:00
1 2 09:45:00 17:45:00
1 3 22:45:00 23:45:00
1 4 08:45:00 01:45:00
1 1 06:45:00 19:45:00
2 1 07:45:00 21:45:00
2 2 016:45:00 22:45:00
我想找到一列来查找家庭成员之间的重叠时间。
I want to find a column to find overlapping time between family members.
我需要该列作为一个人或与另一个人有时间交集的人的索引。
I need that column to be index of a person or persons who has/have time intersection with another one.
在上述示例的第一个家庭中,第一,第二和第四个人的时间有交集。
In the above example first family, the time of first, second and forth persons have intersection.
输出:
household person start time end time overlap
1 1 07:45:00 21:45:00 2,4
1 2 09:45:00 17:45:00 1,4
1 3 22:45:00 23:45:00 NA
1 4 08:45:00 01:45:00 1,2
1 1 18:45:00 19:45:00 NA
2 1 07:45:00 21:45:00 2
2 2 016:45:00 22:45:00 1
NA表示没有与其他家庭成员的交集,它可以是0或其他任何值。
NA means no intersection with other family member it can be 0 or whatever
推荐答案
左加入输入 DF
自身加入同一个家庭中的其他人,并且处于重叠状态。然后将匹配的人逐行连接成逗号分隔的字符串。
Left join the input DF
to itself joining on other persons in the same household and on the overlap condition. Then group by row concatenating the matched persons into a comma separated string.
在没有解释什么构成重叠的情况下,我们尝试了三种不同的重叠定义。第三个是最接近问题中显示的输出。
In the absence of an explanation of what constitutes overlap we try three different definitions of overlap. The third is the closest to the output shown in the question.
-
如果
end_time< start_time
,那么end_time
之前和start_time
之后的所有内容都应检查是否有重叠。然后,根据连接的左右两侧是否满足此条件,将重叠条件分解为4种情况。
if
end_time < start_time
then everything beforeend_time
and afterstart_time
are in the interval to be checked for overlap. The overlap condition then decomposes into 4 cases according to whether the left and right hand sides of the join satisfy this or not.
如果 start_time> end_time
在左侧或右侧,那么我们认为两者不重叠
if start_time > end_time
on either the left or right hand side then we regard the two as not overlapping
如果end_time> start_time则将它们反转并
If end_time > start_time then reverse them and perform overlap as before.
重叠的第一个重叠定义
First overlap definition of overlap
library(sqldf)
sqldf("select a.*, group_concat(distinct b.person) as overlap
from DF a
left join DF b
on a.household = b.household and
a.person != b.person and
(case
when a.start_time <= a.end_time and b.start_time <= b.end_time then
(a.start_time between b.start_time and b.end_time or
b.start_time between a.start_time and a.end_time)
when a.start_time <= a.end_time and b.start_time > b.end_time then
not (a.start_time between b.end_time and b.start_time and
a.end_time between b.end_time and b.start_time)
when a.start_time > a.end_time and b.start_time <= b.end_time then
not (b.start_time between a.end_time and a.start_time and
b.end_time between a.end_time and a.start_time)
else 1 end)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2
2 1 2 09:45:00 17:45:00 1,4
3 1 3 22:45:00 23:45:00 4
4 1 4 08:45:00 01:45:00 2,3
5 1 1 06:45:00 19:45:00 2
6 2 1 07:45:00 21:45:00 2
7 2 2 016:45:00 22:45:00 1
重叠的第二个重叠定义
Second overlap definition of overlap
library(sqldf)
sqldf("select a.*, group_concat(distinct b.person) as overlap
from DF a
left join DF b
on a.household = b.household and
a.person != b.person and
(case
when a.start_time <= a.end_time and b.start_time <= b.end_time then
(a.start_time between b.start_time and b.end_time or
b.start_time between a.start_time and a.end_time)
else 0 end)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2
2 1 2 09:45:00 17:45:00 1
3 1 3 22:45:00 23:45:00 <NA>
4 1 4 08:45:00 01:45:00 <NA>
5 1 1 06:45:00 19:45:00 2
6 2 1 07:45:00 21:45:00 2
7 2 2 016:45:00 22:45:00 1
重叠的第三种定义
Third definition of overlap
sqldf("with DF2(rowid, household, person, start_time, end_time, st, en) as (
select rowid, *,
min(start_time, end_time) as st,
max(start_time, end_time) as en
from DF)
select a.household, a.person, a.start_time, a.end_time,
group_concat(distinct b.person) as overlap
from DF2 a
left join DF2 b
on a.household = b.household and
a.person != b.person and
(a.st between b.st and b.en or
b.st between a.st and a.en)
group by a.rowid")
给予:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2,4
2 1 2 09:45:00 17:45:00 1
3 1 3 22:45:00 23:45:00 <NA>
4 1 4 08:45:00 01:45:00 1
5 1 1 06:45:00 19:45:00 2,4
6 2 1 07:45:00 21:45:00 2
7 2 2 16:45:00 22:45:00 1
注意
我们假定可重复输入的输入 DF
是:
DF <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), person = c(1L,
2L, 3L, 4L, 1L, 1L, 2L), start_time = c("07:45:00", "09:45:00",
"22:45:00", "08:45:00", "06:45:00", "07:45:00", "16:45:00"),
end_time = c("21:45:00", "17:45:00", "23:45:00", "01:45:00",
"19:45:00", "21:45:00", "22:45:00")), class = "data.frame", row.names = c(NA,
-7L))
这篇关于查找组中个人的开始时间和结束时间之间的重叠时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!