使用data.table查找时间戳对之间的重叠持续时间 [英] Find the duration of overlap between pairs of timestamps using data.table

查看：73 发布时间：2020/10/15 20:42:02 r datetime data.table overlap

本文介绍了使用data.table查找时间戳对之间的重叠持续时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

类似于这个问题，我想使用 data.table 查找时间戳对之间的重叠持续时间。 p>

这是我当前的代码：

 库（data.table）
 
 DT<-fread（
 stage，ID，date1，date2 
 1，A，2018-04-17 00：00：00,2018-04-17 01： 00:00 
 1，B，2018-04-17 00：00：00,2018-04-17 00:20:00 
 1，C，2018-04-17 00:15:00 ，2018-04-17 01:00:00 
 2，B，2018-04-17 00：30：00,2018-04-17 01:10:00 
 2，D，2018- 04-17 00：30：00,2018-04-17 00:50:00，
 sep =， 
）
 
 cols <-c（ date1， date2）
 DT [，（cols）：= lapply（.SD，as.POSIXct），.SDcols = cols] 
 
中断<-DT [，{ 
 tmp<-unique（sort（c（date1，date2）））
。（开始= head（tmp，-1L），end = tail（tmp，-1L））
 }，按=阶段] 
 
结果<-DT [breaks，on =。（stage，date1< =开始，date2> =结束），paste（ID，崩溃= +），
 by = .EACHI，allow。 = T]％&％;％
 mutate（lengthinseconds = as.numeric（difftime（date2，date1，units = secs））））

返回哪个：

 阶段date1 date2 V1长度秒数
 1 1 2018-04-17 00:00:00 2018-04-17 00:15:00 B + A 900 
 2 1 2018-04-17 00:15:00 2018-04-17 00:20:00 B + A + C 300 
 3 1 2018-04-17 00:20:00 2018-04-17 01:00:00 A + C 2400 
 4 2 2018-04-17 00:30 ：00 2018-04-17 00:50:00 D + B 1200 
 5 2 2018-04-17 00:50:00 2018-04-17 01:10:00 B 1200

但我只想返回用户dyad之间的重叠（即不超过两个重叠的用户）。我可以想到几种方法，例如：

  library（dplyr）
 library（tidyr ）
 
结果％>％
 filter（nchar（V1）== 3）％&％;％
 tidyr :: separate（V1，c（ ID1， ID2））

返回哪个：

 阶段date1 date2 ID1 ID2 lengthinseconds 
 1 1 2018-04-17 00:00:00 2018-04-17 00:15:00 BA 900 
 2 1 2018-04-17 00:20:00 2018-04-17 01:00:00 AC 2400 
 3 2 2018-04-17 00:30:00 2018-04-17 00:50:00 DB 1200

但这似乎并不优雅，尤其是在处理较长的 ID 字符串，每个重叠可能有数百个 ID 。

 
 
 理想情况下，我想知道是否有一种方法可以修改原始的 data.table 代码以直接返回此代码。
解决方案
乍看之下引用性能方面的考虑），这仅需对OP的代码进行较小的修改：
  result<-DT [breaks，on =。 （stage，date1< =开始，date2> =结束），
如果（.N == 2L）paste（ID，崩溃= +），
 by = .EACHI，允许。 cartesian = TRUE] 
结果
  
 
 
 
 
 
 阶段date1 date2 V1 
 1：1 2018-04-17 00:00:00 2018-04-17 00:15:00 B + A 
 2：1 2018-04-17 00:20:00 2018-04-17 01:00:00 A + C 
 3：2 2018-04-17 00:30:00 2018-04-17 00:50:00 D + B 
  
 
 
仅适用于那些组，即时间范围，其中恰好有两个用户有效的结果行将被创建。
 
 
 
 
 
  OP要求显示两个 ID 放在不同的列中，以显示重叠的持续时间。另外，我建议对 ID 进行排序。
 结果<-DT [breaks，on =。（stage，date1< = start，date2> = end），
 if（.N == 2L）{
 tmp<-sort（ID ）
。（ID1 = tmp [1]，ID2 = tmp [2]，dur.in.sec = difftime（end，start，units = secs））
}，
由= .EACHI，allow.cartesian = TRUE] 
结果
  
 
 
 
 
 
 阶段date1 date2 ID1 ID2 dur.in.sec 
 1：1 2018-04-17 00:00:00 2018-04-17 00:15:00 AB 900 secs 
 2：1 2018-04-17 00:20:00 2018-04-17 01:00:00 AC 2400 secs 
 3：2 2018-04-17 00:30:00 2018- 04-17 00:50:00 BD 1200秒
  
 
 
 
Similar to this question, I'd like to find the duration of overlap between pairs of timestamps using data.table.

Here's my current code:
library(data.table)

DT <- fread(
  "stage,ID,date1,date2
  1,A,2018-04-17 00:00:00,2018-04-17 01:00:00
  1,B,2018-04-17 00:00:00,2018-04-17 00:20:00
  1,C,2018-04-17 00:15:00,2018-04-17 01:00:00
  2,B,2018-04-17 00:30:00,2018-04-17 01:10:00
  2,D,2018-04-17 00:30:00,2018-04-17 00:50:00",
  sep = ","
)

cols <- c("date1", "date2")
DT[, (cols) := lapply(.SD, as.POSIXct), .SDcols = cols]

breaks <- DT[, {
  tmp <- unique(sort(c(date1, date2)))
  .(start = head(tmp, -1L), end = tail(tmp, -1L))
}, by = stage]

result <- DT[breaks, on = .(stage, date1 <= start, date2 >= end), paste(ID, collapse = "+"),  
    by = .EACHI, allow.cartesian = T] %>% 
  mutate(lengthinseconds = as.numeric(difftime(date2, date1, units = "secs")))
Which returns:
  stage               date1               date2    V1 lengthinseconds
1     1 2018-04-17 00:00:00 2018-04-17 00:15:00   B+A             900
2     1 2018-04-17 00:15:00 2018-04-17 00:20:00 B+A+C             300
3     1 2018-04-17 00:20:00 2018-04-17 01:00:00   A+C            2400
4     2 2018-04-17 00:30:00 2018-04-17 00:50:00   D+B            1200
5     2 2018-04-17 00:50:00 2018-04-17 01:10:00     B            1200
But I'd like to return only overlaps between user dyads (i.e. no more than two overlapping users). There are several hacky ways I can think of achieve this, such as:
library(dplyr)
library(tidyr)

result %>% 
  filter(nchar(V1)==3) %>% 
  tidyr::separate(V1, c("ID1", "ID2"))
Which returns:
  stage               date1               date2 ID1 ID2 lengthinseconds
1     1 2018-04-17 00:00:00 2018-04-17 00:15:00   B   A             900
2     1 2018-04-17 00:20:00 2018-04-17 01:00:00   A   C            2400
3     2 2018-04-17 00:30:00 2018-04-17 00:50:00   D   B            1200
But this seems inelegant, especially when dealing with longer ID strings and potentially hundreds of IDs per overlap.

Ideally, I'd like to know if there's a way to modify the original data.table code to return this directly.
 解决方案 
At first glance (and neglecting performance considerations), this requires only a minor modification to OP's code:
result <- DT[breaks, on = .(stage, date1 <= start, date2 >= end), 
             if (.N == 2L) paste(ID, collapse = "+"),  
             by = .EACHI, allow.cartesian = TRUE]
result



   stage               date1               date2  V1
1:     1 2018-04-17 00:00:00 2018-04-17 00:15:00 B+A
2:     1 2018-04-17 00:20:00 2018-04-17 01:00:00 A+C
3:     2 2018-04-17 00:30:00 2018-04-17 00:50:00 D+B

Only for those groups, i.e., time ranges, where exactly two users are active a result row will be created.



The OP has requested to show the two IDs in separate columns plus to show the duration of the overlap. In addition, I suggest to have the IDs sorted.
result <- DT[breaks, on = .(stage, date1 <= start, date2 >= end), 
   if (.N == 2L) {
     tmp <- sort(ID)
     .(ID1 = tmp[1], ID2 = tmp[2], dur.in.sec = difftime(end, start, units = "secs"))
     },  
   by = .EACHI, allow.cartesian = TRUE]
result



   stage               date1               date2 ID1 ID2 dur.in.sec
1:     1 2018-04-17 00:00:00 2018-04-17 00:15:00   A   B   900 secs
2:     1 2018-04-17 00:20:00 2018-04-17 01:00:00   A   C  2400 secs
3:     2 2018-04-17 00:30:00 2018-04-17 00:50:00   B   D  1200 secs



                        
这篇关于使用data.table查找时间戳对之间的重叠持续时间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用data.table查找时间戳对之间的重叠持续时间 [英] Find the duration of overlap between pairs of timestamps using data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用data.table查找时间戳对之间的重叠持续时间 [英] Find the duration of overlap between pairs of timestamps using data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭