具有两个条件的Inner_join和间隔条件内的间隔 [英] Inner_join with two conditions and interval within interval condition

查看:359
本文介绍了具有两个条件的Inner_join和间隔条件内的间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试根据多个条件和时间间隔条件加入2个数据框,如下例所示:

 #两个示例数据框时间间隔为
df1 <-data.frame(key1 = c( a, b, c, d, e),
key2 = c(1: 5),
time1 = as.POSIXct(hms :: as.hms(c( 00:00:15, 00:15:15, 00:30:15, 00:40 :15, 01:10:15)))),
time2 = as.POSIXct(hms :: as.hms(c( 00:05:15, 00:20:15, 00:35:15, 00:45:15, 01:15:15))))%>%
mutate(t1 = interval(time1,time2))%&%
select(key1,key2,t1)

df2<-data.frame(key1 = c( b, c, a, e, d ),
key2 = c(2,6,1,8,8,5),
sam1 = as.POSIXct(hms :: as.hms(c( 00:21:15, 00:31:15, 00:03:15, 01:20:15, 00:43:15)))),
sam2 = as.POSIXct(hms :: as.hms (c( 00:23:15, 00:34:15, 00:04:15, 01:25:15, 00:44:15))))%>%
mutate( t2 =间隔(sam1,sam2))%>%
select(key1,key2,t2)

首先需要对应的是列 key1 key2 ,这可以通过以下(产生错误):

  df<-inner_join(df1,df2,by = c( key1, key2 ))

但是加入时还有一个条件需要检查,即间隔 t2 t1 之内。我可以这样手动操作:

  df $ t2%within%df $ t1 
 #所需数据框
df<-data.frame(key1 = c( a, b ),key2 = c(1,2),time_condition = c(TRUE,FALSE))

如果t1从 00:00:15到 00:05:15 ,则相应的t2从 00:03:15到 00:04:15 将位于时间间隔t1之内。如果t2在t1内,则将导致time_condition列为 TRUE ,否则为FALSE。

解决方案

使用data.table,可以在加入时执行操作。这是一个示例

  library(data.table)
df2 [df1,#left join
。 (time_condition = sam1> time1& sam2< time2),#在=上加入
时的条件。 $ b nomatch = 0L]#使其成为内部联接

#key1 key2 time_condition
#1:a 1 TRUE
#2:b 2 FALSE






 #您使用数据生成的数据。表

df1 <-data.table(key1 = c( a, b, c, d, e),
key2 = c( 1:5),
time1 = as.ITime(c( 00:00:15, 00:15:15, 00:30:15, 00:40:15, 01:10:15)),
time2 = as.ITime(c( 00:05:15, 00:20:15, 00:35:15, 00:45: 15, 01:15:15)))
df2<-data.table(key1 = c( b, c, a, e, d),
key2 = c(2,6,1,8,8),
sam1 = as.ITime(c( 00:21:15 , 00:31:15, 00:03:15, 01:20:15, 00:43:15))),
sam2 = as.ITime(c( 00 :23:15, 00:34:15, 00:04:15, 01:25:15, 00:44:15)))


Trying to join 2 dataframes according to multiple conditions and time interval condition like in the following example:

# two sample dataframes with time intervals
df1 <- data.frame(key1 = c("a", "b", "c", "d", "e"),
                   key2 = c(1:5),
                   time1 = as.POSIXct(hms::as.hms(c("00:00:15", "00:15:15", "00:30:15", "00:40:15", "01:10:15"))),
                   time2 = as.POSIXct(hms::as.hms(c("00:05:15", "00:20:15", "00:35:15", "00:45:15", "01:15:15")))) %>% 
  mutate(t1 = interval(time1, time2)) %>%
  select(key1, key2, t1)  

df2 <- data.frame(key1 = c("b", "c", "a", "e", "d"),
                   key2 = c(2, 6, 1, 8, 5),
                   sam1 = as.POSIXct(hms::as.hms(c("00:21:15", "00:31:15", "00:03:15", "01:20:15", "00:43:15"))),
                   sam2 = as.POSIXct(hms::as.hms(c("00:23:15", "00:34:15", "00:04:15", "01:25:15", "00:44:15")))) %>% 
mutate(t2 = interval(sam1, sam2)) %>%
select(key1, key2, t2)

The first thing that needs to correspond are columns key1 and key2, and that can be done with the following (produces error):

df <- inner_join(df1, df2, by = c("key1", "key2"))

But there is one more condition that needs to be checked when joining and that is if the interval t2 is within t1. I can do this manually like this:

 df$t2 %within% df$t1

I guess the error is from joining dataframes with intervals and this might not be the right way to do it which is why there are errors.

# desired dataframe
df <- data.frame(key1 = c("a", "b"), key2 = c(1,2), time_condition = c(TRUE, FALSE))

If the t1 is from "00:00:15" to "00:05:15" then the corresponding t2 which is "00:03:15" to "00:04:15" is going to be within the interval t1. This would result in the time_condition column which will be TRUE if t2 is within t1, and FALSE otherwise.

解决方案

Using data.table, you can perform operations while joining. Here is an example

library(data.table)
df2[df1, # left join
    .(time_condition = sam1 > time1 & sam2 < time2), # condition while joining
    on = .(key1, key2), # keys
    by = .EACHI, # check condition per join
    nomatch = 0L] # make it an inner join

#    key1 key2 time_condition
# 1:    a    1           TRUE
# 2:    b    2          FALSE


# your data generated using data.table

df1 <- data.table(key1 = c("a", "b", "c", "d", "e"),
                  key2 = c(1:5),
                  time1 = as.ITime(c("00:00:15", "00:15:15", "00:30:15", "00:40:15", "01:10:15")),
                  time2 = as.ITime(c("00:05:15", "00:20:15", "00:35:15", "00:45:15", "01:15:15"))) 
df2 <- data.table(key1 = c("b", "c", "a", "e", "d"),
                  key2 = c(2, 6, 1, 8, 5),
                  sam1 = as.ITime(c("00:21:15", "00:31:15", "00:03:15", "01:20:15", "00:43:15")),
                  sam2 = as.ITime(c("00:23:15", "00:34:15", "00:04:15", "01:25:15", "00:44:15")))

这篇关于具有两个条件的Inner_join和间隔条件内的间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆