在每组数据集中查找重叠时间 [英] finding overlapping time in each group of data set

查看：47 发布时间：2020/10/17 2:39:44 r dataframe

本文介绍了在每组数据集中查找重叠时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有住户列，每个住户中的人，游览（每个游览包含每个人的不同旅程）和方式（每个游览中每个人的旅行方式），time_ARR游览开始时间，time_Dep结束时间

我想找到一个关于有汽车模式的人和没有汽车模式的人的指标。

如果游览时间与家庭中有汽车的人相交，则该指标对于旅游中具有非汽车模式的每个人来说都是1。 p>

下面是要说明的示例：

 家庭人模式游览开始时间结束时间
 1 1车1 2:30 15:30 
 1 1非车2 20:00 8:30 
 1 2非车1 3:00 10:00 
 1 3车1 19:10 24:00 
 2 1非车1 3:00 10:00 
 2 2车1 19:10 24:00

在第一个家庭中，第一个人1在第二次旅行中具有非汽车模式，并且与第三人称相交。

第一家庭中的第二人称2也具有非汽车模式，并且在他的第一次旅行中她也与第一人称交叉。

在第二个家庭成员1中具有非汽车模式，并且不与其他人的汽车模式相交。
so

 家庭成员模式游览开始时间结束时间。指示器
 1 1汽车1 2:30 15:30 NA 
 1 1非汽车2 20:00 8:30 1 
 1 2非汽车1 3:00 10:00。 1 
 1 3车1 19:10 24:00。 NA 
 2 1非汽车1 3:00 10:00。 0 
 2 2车1 19:10 24:00。 NA

而不是NA可以是0或1，根本没有关系

解决方案

一种查看方法是使用 data.table :: foverlaps ，

准备数据

  dat<-read.table（header = TRUE，stringsAsFactors = FALSE，文本= 
家庭模式游览开始时间结束时间
 1 1汽车1 2:30 15:30 
 1 1非汽车2 20:00 8:30 
 1 2非汽车1 3:00 10:00 
 1 3汽车1 19:10 24:00 
 2 1非汽车1 3:00 10:00 
 2 2汽车1 19:10 24:00）
库（data.table）
 setDT（dat）
 
＃转换为实际时间戳记...也可能使用lubridate或hms软件包
 dat [，c（ starttime， endtime）：= lapply（。（starttime，endtime），as.POSIXct，format =％H：％M）] 
＃分配一个简单的每行ID 
 dat [，rowid：= seq_len（.N）]

，因为您仅在示例数据中列出了时间，所以发生了向后事件，因此我将 endtime 更改为明天：

  dat [starttime>结束时间，] 
＃家庭人模式游览开始时间结束时间rowid 
＃1：1 1非汽车2 2019-07-29 20:00:00 2019-07-29 08:30:00 2 
 dat [开始时间> endtime，endtime：= endtime + 86400]

模糊重叠

  setkey（dat，开始时间，结束时间）
合并<-foverlaps（dat [，。（rowid，mode，starttime， endtime）]，dat [，。（rowid，mode，starttime，endtime）]）
 merged [mode == car& i.mode！= car，] 
＃rowid模式起始时间结束时间i.rowid i.mode i.starttime i.endtime 
＃1：1汽车2019-07-29 02:30:00 2019-07-29 15:30:00 3非汽车2019-07-29 03:00:00 2019-07-29 10:00:00 
＃2：1汽车2019-07-29 02： 30:00 2019-07-29 15:30:00 5非汽车2019-07-29 03:00:00 2019-07-29 10:00:00 
＃3：4汽车2019-07- 29 19:10:00 2019-07-30 00:00:00 2非汽车2019-07-29 20:00:00 2019-07-30 08:30:00 
＃4：6汽车2019 -07-29 19:10:00 2019-07-30 00:00:00 2非车2019-07-29 20:00:00 2019-07-30 08:30:00

要摆脱的要点是 i.rowid 显示第二个人是非汽车 ，而第一个人是汽车 。由此，很容易确定

 ＃没有汽车补充的非汽车人
 setdiff（dat $ rowid，merged [mode == car& i.mode！= car，] $ i.rowid）
＃[1] 1 4 6 
 
＃拥有汽车补给的非汽车人
 unique（merged [mode == car& i.mode！= car，] $ i.rowid）
＃[ 1] 3 5 2 
 
＃非汽车人可能可以使用这些合并的汽车人
 [mode == car& i.mode！= car，] [，..（hascar = rowid，needscar = i.rowid）] 
＃hascar needscar 
＃1：1 3 
＃2：1 5 
＃3：4 2 
＃4：6 2

I have columns household , persons in each household, tour (each tour contains different trips for each person) ,and mode ( mode of travel of each person in each tour), time_ARR start time of tour, time_Dep end time of the tour.

I want to find an indicator with respect of people who have car mode and people who have non-car mode.

The indicator is 1 for each person who have non-car mode in a tour if the time of tour has intersection with a person in a household with mode car.

here is example to make it clear:

  family    persons    mode    tour   start time    end time
     1      1           car     1        2:30         15:30
     1      1         non-car   2        20:00        8:30
     1      2         non-car   1        3:00         10:00
     1      3           car     1        19:10        24:00
     2      1         non-car   1        3:00         10:00
     2      2           car     1        19:10        24:00

In the first family person 1 has non-car mode in his second tour and it has intersection with third person.

also second person 2 in first family has non-car mode and she is also has intersection with first person in his first tour.

in the second family person 1 has non-car mode and it dose not intersection with car mode of other people . so

  family    persons    mode    tour   start time    end time. indicator
     1      1           car     1        2:30         15:30.      NA
     1      1         non-car   2        20:00        8:30.       1
     1      2         non-car   1        3:00         10:00.      1 
     1      3           car     1        19:10        24:00.      NA
     2      1         non-car   1        3:00         10:00.      0
     2      2           car     1        19:10        24:00.      NA

instead of NA it can be 0 or one , it dose not matter at all

解决方案

One way to look at it is to use data.table::foverlaps, using the times as overlapping events.

Prepping data

dat <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
  family    persons    mode    tour   starttime    endtime
     1      1           car     1        2:30         15:30
     1      1         non-car   2        20:00        8:30
     1      2         non-car   1        3:00         10:00
     1      3           car     1        19:10        24:00
     2      1         non-car   1        3:00         10:00
     2      2           car     1        19:10        24:00")
library(data.table)
setDT(dat)

# convert to actual timestamps ... might also use lubridate or hms packages
dat[, c("starttime", "endtime") := lapply(.(starttime, endtime), as.POSIXct, format = "%H:%M") ]
# assign a simple per-row id
dat[, rowid := seq_len(.N)]

Unfortunately, because you only list times in your sample data, you have a backwards event, so I'll shift the endtime to "tomorrow":

dat[starttime > endtime,]
#    family persons    mode tour           starttime             endtime rowid
# 1:      1       1 non-car    2 2019-07-29 20:00:00 2019-07-29 08:30:00     2
dat[starttime > endtime, endtime := endtime + 86400 ]

Fuzzy Overlaps

setkey(dat, starttime, endtime)
merged <- foverlaps(dat[,.(rowid,mode,starttime,endtime)], dat[,.(rowid,mode,starttime,endtime)])
merged[ mode == "car" & i.mode != "car", ]
#    rowid mode           starttime             endtime i.rowid  i.mode         i.starttime           i.endtime
# 1:     1  car 2019-07-29 02:30:00 2019-07-29 15:30:00       3 non-car 2019-07-29 03:00:00 2019-07-29 10:00:00
# 2:     1  car 2019-07-29 02:30:00 2019-07-29 15:30:00       5 non-car 2019-07-29 03:00:00 2019-07-29 10:00:00
# 3:     4  car 2019-07-29 19:10:00 2019-07-30 00:00:00       2 non-car 2019-07-29 20:00:00 2019-07-30 08:30:00
# 4:     6  car 2019-07-29 19:10:00 2019-07-30 00:00:00       2 non-car 2019-07-29 20:00:00 2019-07-30 08:30:00

The gist to take away from this is that i.rowid shows the "second person" who is "non-car" while the first person is "car". From this, it's easy enough to determine

# non-car people without a "car" complement
setdiff(dat$rowid, merged[ mode == "car" & i.mode != "car", ]$i.rowid)
# [1] 1 4 6

# non-car people with a car complement
unique(merged[ mode == "car" & i.mode != "car", ]$i.rowid)
# [1] 3 5 2

# non-car people might be able to use these car people
merged[ mode == "car" & i.mode != "car", ][, .(hascar = rowid, needscar = i.rowid)]
#    hascar needscar
# 1:      1        3
# 2:      1        5
# 3:      4        2
# 4:      6        2

这篇关于在每组数据集中查找重叠时间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在每组数据集中查找重叠时间 [英] finding overlapping time in each group of data set

问题描述

准备数据

模糊重叠

Prepping data

Fuzzy Overlaps

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在每组数据集中查找重叠时间 [英] finding overlapping time in each group of data set

问题描述

准备数据

模糊重叠

Prepping data

Fuzzy Overlaps

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭