基于多列合并两个数据集,但使时间列在分钟范围内灵活 [英] Merge two datasets based on multiple columns but make time column flexible within minute range

查看:63
本文介绍了基于多列合并两个数据集,但使时间列在分钟范围内灵活的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据集

I have a dataset that looks like this

id|date     |social_id     | race    | age | time        |Location
 1  04/02/19  2000001          W        29    "04:10:05"  HA
 2  04/06/20  2000002          B        22    "05:12:49"  CA
 3  04/12/20  2000021          B        26    "09:13:32"  MA
 4  08/14/20  2000026          A        29    "06:12:34"  VT

第二个数据集看起来像这样

and the second dataset looks like this

    id2|date     |social_id     | race    | age | time| sex
      1  04/02/19  2000001          W        29    "04:30:05"  M
      2  04/06/20  2000002          B        22    "05:49:49"  F
      3  04/12/20  2000021          B        26    "10:13:32"  M
      4  08/14/20  2000026          A        29    "06:19:54"  F

请注意,除时间外,所有列都相同.我想根据这些列进行联接

Note that all columns are the same except for time. I would like to do a join based on these columns

日期、social_id、race_age 和时间.然而,两个数据集的时间不匹配

date, social_id, race_age, and time. However time does not match for both datasets

df3 <- df1 %>% left_join(df2,by=c("date","social_id","race","time"))

有没有办法进行多列连接但在 45 分钟内对时间进行例外处理?时间是字符串格式,所以我通过写作进行了调整

is there a way to do a multiple column join but make an exception for time within a 45 minutes? Time is in string format so I adjusted for it by writing

abs(difftime(as.POSIXct(strptime(df1$time,format="%H:%M:%S")), as.POSIXct(strptime(df2$time,format="%H:%M:%S")),units = "mins")) <= 45

它自己工作并识别时间字符串是否在 45 分钟内.当我进行合并时,我如何将它们整合在一起?

This works on its own and recognizes if the time string is within 45 minutes or not. How can I bring this together when i do the merge?

structure(list(id = 1:4, date = c("4/2/2019", "4/6/2020", "4/12/2020", 
"8/14/2020"), race = c("w", "b", "b", "a"), age = c(29L, 22L, 
26L, 29L), time = structure(c(15005L, 18769L, 33212L, 22354L), class = 
"ITime")), row.names = c(NA, 
-4L), class = "data.frame")

structure(list(id2 = 1:4, date = c("4/2/2019", "4/6/2020", "4/12/2020", 
"8/14/2020"), race = c("w", "b", "b", "a"), age = c(29L, 22L, 
26L, 29L), time = structure(c(16205L, 20989L, 36812L, 22794L), class = 
"ITime")), row.names = c(NA, 
-4L), class = "data.frame")

推荐答案

We can use round_date from lubridate

We could use round_date from lubridate

library(dplyr)
library(lubridate)
library(stringr)
df1 %>% 
    mutate(datetime = round_date(mdy_hms(str_c(date, time,
       sep = ' ')), '45 mins')) %>% 
    left_join(df2 %>%
       mutate(datetime = round_date(mdy_hms(str_c(date, time,
          sep = ' ')), '45 mins')), 
    by = c('datetime', 'id' = 'id2', 'race', 'age'))

这篇关于基于多列合并两个数据集,但使时间列在分钟范围内灵活的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆