在每组数据集中查找重叠时间 [英] finding overlapping time in each group of data set
问题描述
我有住户列,每个住户中的人,游览(每个游览包含每个人的不同旅程)和方式(每个游览中每个人的旅行方式),time_ARR游览开始时间,time_Dep结束时间
我想找到一个关于有汽车模式的人和没有汽车模式的人的指标。
如果游览时间与家庭中有汽车的人相交,则该指标对于旅游中具有非汽车模式的每个人来说都是1。 p>
下面是要说明的示例:
家庭人模式游览开始时间结束时间
1 1车1 2:30 15:30
1 1非车2 20:00 8:30
1 2非车1 3:00 10:00
1 3车1 19:10 24:00
2 1非车1 3:00 10:00
2 2车1 19:10 24:00
在第一个家庭中,第一个人1在第二次旅行中具有非汽车模式,并且与第三人称相交。
第一家庭中的第二人称2也具有非汽车模式,并且在他的第一次旅行中她也与第一人称交叉。
在第二个家庭成员1中具有非汽车模式,并且不与其他人的汽车模式相交。
so
家庭成员模式游览开始时间结束时间。指示器
1 1汽车1 2:30 15:30 NA
1 1非汽车2 20:00 8:30 1
1 2非汽车1 3:00 10:00。 1
1 3车1 19:10 24:00。 NA
2 1非汽车1 3:00 10:00。 0
2 2车1 19:10 24:00。 NA
而不是NA可以是0或1,根本没有关系
一种查看方法是使用 data.table :: foverlaps
,
准备数据
dat<-read.table(header = TRUE,stringsAsFactors = FALSE,文本=
家庭模式游览开始时间结束时间
1 1汽车1 2:30 15:30
1 1非汽车2 20:00 8:30
1 2非汽车1 3:00 10:00
1 3汽车1 19:10 24:00
2 1非汽车1 3:00 10:00
2 2汽车1 19:10 24:00)
库(data.table)
setDT(dat)
#转换为实际时间戳记...也可能使用lubridate或hms软件包
dat [,c( starttime, endtime):= lapply(。(starttime,endtime),as.POSIXct,format =%H:%M)]
#分配一个简单的每行ID
dat [,rowid:= seq_len(.N)]
,因为您仅在示例数据中列出了时间,所以发生了向后事件,因此我将 endtime
更改为明天:
dat [starttime>结束时间,]
#家庭人模式游览开始时间结束时间rowid
#1:1 1非汽车2 2019-07-29 20:00:00 2019-07-29 08:30:00 2
dat [开始时间> endtime,endtime:= endtime + 86400]
模糊重叠
setkey(dat,开始时间,结束时间)
合并<-foverlaps(dat [,。(rowid,mode,starttime, endtime)],dat [,。(rowid,mode,starttime,endtime)])
merged [mode == car& i.mode!= car,]
#rowid模式起始时间结束时间i.rowid i.mode i.starttime i.endtime
#1:1汽车2019-07-29 02:30:00 2019-07-29 15:30:00 3非汽车2019-07-29 03:00:00 2019-07-29 10:00:00
#2:1汽车2019-07-29 02: 30:00 2019-07-29 15:30:00 5非汽车2019-07-29 03:00:00 2019-07-29 10:00:00
#3:4汽车2019-07- 29 19:10:00 2019-07-30 00:00:00 2非汽车2019-07-29 20:00:00 2019-07-30 08:30:00
#4:6汽车2019 -07-29 19:10:00 2019-07-30 00:00:00 2非车2019-07-29 20:00:00 2019-07-30 08:30:00
要摆脱的要点是 i.rowid
显示第二个人是非汽车
,而第一个人是汽车
。由此,很容易确定
#没有汽车补充的非汽车人
setdiff(dat $ rowid,merged [mode == car& i.mode!= car,] $ i.rowid)
#[1] 1 4 6
#拥有汽车补给的非汽车人
unique(merged [mode == car& i.mode!= car,] $ i.rowid)
#[ 1] 3 5 2
#非汽车人可能可以使用这些合并的汽车人
[mode == car& i.mode!= car,] [,..(hascar = rowid,needscar = i.rowid)]
#hascar needscar
#1:1 3
#2:1 5
#3:4 2
#4:6 2
I have columns household , persons in each household, tour (each tour contains different trips for each person) ,and mode ( mode of travel of each person in each tour), time_ARR start time of tour, time_Dep end time of the tour.
I want to find an indicator with respect of people who have car mode and people who have non-car mode.
The indicator is 1 for each person who have non-car mode in a tour if the time of tour has intersection with a person in a household with mode car.
here is example to make it clear:
family persons mode tour start time end time
1 1 car 1 2:30 15:30
1 1 non-car 2 20:00 8:30
1 2 non-car 1 3:00 10:00
1 3 car 1 19:10 24:00
2 1 non-car 1 3:00 10:00
2 2 car 1 19:10 24:00
In the first family person 1 has non-car mode in his second tour and it has intersection with third person.
also second person 2 in first family has non-car mode and she is also has intersection with first person in his first tour.
in the second family person 1 has non-car mode and it dose not intersection with car mode of other people . so
family persons mode tour start time end time. indicator
1 1 car 1 2:30 15:30. NA
1 1 non-car 2 20:00 8:30. 1
1 2 non-car 1 3:00 10:00. 1
1 3 car 1 19:10 24:00. NA
2 1 non-car 1 3:00 10:00. 0
2 2 car 1 19:10 24:00. NA
instead of NA it can be 0 or one , it dose not matter at all
One way to look at it is to use data.table::foverlaps
, using the times as overlapping events.
Prepping data
dat <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
family persons mode tour starttime endtime
1 1 car 1 2:30 15:30
1 1 non-car 2 20:00 8:30
1 2 non-car 1 3:00 10:00
1 3 car 1 19:10 24:00
2 1 non-car 1 3:00 10:00
2 2 car 1 19:10 24:00")
library(data.table)
setDT(dat)
# convert to actual timestamps ... might also use lubridate or hms packages
dat[, c("starttime", "endtime") := lapply(.(starttime, endtime), as.POSIXct, format = "%H:%M") ]
# assign a simple per-row id
dat[, rowid := seq_len(.N)]
Unfortunately, because you only list times in your sample data, you have a backwards event, so I'll shift the endtime
to "tomorrow":
dat[starttime > endtime,]
# family persons mode tour starttime endtime rowid
# 1: 1 1 non-car 2 2019-07-29 20:00:00 2019-07-29 08:30:00 2
dat[starttime > endtime, endtime := endtime + 86400 ]
Fuzzy Overlaps
setkey(dat, starttime, endtime)
merged <- foverlaps(dat[,.(rowid,mode,starttime,endtime)], dat[,.(rowid,mode,starttime,endtime)])
merged[ mode == "car" & i.mode != "car", ]
# rowid mode starttime endtime i.rowid i.mode i.starttime i.endtime
# 1: 1 car 2019-07-29 02:30:00 2019-07-29 15:30:00 3 non-car 2019-07-29 03:00:00 2019-07-29 10:00:00
# 2: 1 car 2019-07-29 02:30:00 2019-07-29 15:30:00 5 non-car 2019-07-29 03:00:00 2019-07-29 10:00:00
# 3: 4 car 2019-07-29 19:10:00 2019-07-30 00:00:00 2 non-car 2019-07-29 20:00:00 2019-07-30 08:30:00
# 4: 6 car 2019-07-29 19:10:00 2019-07-30 00:00:00 2 non-car 2019-07-29 20:00:00 2019-07-30 08:30:00
The gist to take away from this is that i.rowid
shows the "second person" who is "non-car"
while the first person is "car"
. From this, it's easy enough to determine
# non-car people without a "car" complement
setdiff(dat$rowid, merged[ mode == "car" & i.mode != "car", ]$i.rowid)
# [1] 1 4 6
# non-car people with a car complement
unique(merged[ mode == "car" & i.mode != "car", ]$i.rowid)
# [1] 3 5 2
# non-car people might be able to use these car people
merged[ mode == "car" & i.mode != "car", ][, .(hascar = rowid, needscar = i.rowid)]
# hascar needscar
# 1: 1 3
# 2: 1 5
# 3: 4 2
# 4: 6 2
这篇关于在每组数据集中查找重叠时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!