根据日期/时间范围和匹配的ID过滤数据 [英] Filter data based on date/time range and matching id

查看:68
本文介绍了根据日期/时间范围和匹配的ID过滤数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据日期/时间范围(开始和结束时间)以及来自另一个数据集的每一行的ID来过滤数据集.最终结果应该是已过滤数据帧的列表.

I am trying to filter a dataset based on the date/time range (start and end time) and the id of each row from another dataset. The end result should be a list of filtered data frames.

下面是创建两个数据集的代码.

Below is the code to create the two data sets.

#This is the dataset to filter 
x <- as.data.frame(format(seq.POSIXt(as.POSIXct("2019-01-01 07:00"), as.POSIXct("2019-01-01 11:50"), by = "10 min"))) #date/time
y <- as.data.frame(format(seq.POSIXt(as.POSIXct("2019-01-01 07:00"), as.POSIXct("2019-01-01 11:50"), by = "10 min"))) #date/time

datetime <- rbind(list(x,y))
datetime <- do.call(rbind, datetime)
datetime <- rename(datetime, datetime=`format(seq.POSIXt(as.POSIXct("2019-01-01 07:00"), as.POSIXct("2019-01-01 11:50"), by = "10 min"))`)
datetime

values <- c(1:60) #value 
id <- vector(mode="character", length=60)
#id <- rep(letters[1:5], 6) #id 

dataloggers <- data.frame(datetime, values, id)
head(dataloggers)

dataloggers[c(1:10), 3] ="a"
dataloggers[c(11:30), 3]="b"
dataloggers[c(31:60), 3]="c"

dataloggers$datetime <- ymd_hms(dataloggers$datetime)

#and this is the dataset used to filter the dataset above 
starttime <- as.POSIXct(c("2019-01-01 07:00", "2019-01-01 08:40", "2019-01-01 07:00:00"))
starttime <- ymd_hms(starttime)
datetime <- as.POSIXct(c("2019-01-01 08:00", "2019-01-01 10:00", "2019-01-01 08:00"))
datetime <- ymd_hms(datetime)
id <- rep(letters[1:3])
data<- data.frame(starttime,datetime, id)

我已经设法通过 for()循环来过滤日期/时间范围:

I have managed to do this using a for() loop to filter the date/time ranges:

my_list <- list() #create empty list
for(i in seq_along(data$starttime)) {
 output <-  dataloggers %>% 
    filter(between(dataloggers$datetime, data$starttime[i], data$datetime[i])) 
  my_list[[i]] <- output
}

my_list <- do.call(rbind, my_list)
my_list

但是如您所见,它仅根据开始和结束时间过滤数据帧.我还需要它根据匹配的ID对其进行过滤. left_join()并没有提供我想要的内容,因为我不想合并数据集.我只希望有一个基于这两个条件的过滤数据帧列表.任何帮助将不胜感激.

But as you can see, it only filters the data frame based on the start and end time. I need it to also filter it based on the matching id. left_join() doesn't give me what I want because I don't want to merge the datasets. I only want to have a list of filtered dataframes based on those two conditions. Any help would be greatly appreciated.

推荐答案

有2种方法:

  1. 基于范围的模糊连接:

fuzzyjoin::fuzzy_inner_join(dataloggers, data, 
               by = c('id', 'datetime' = 'starttime', 'datetime'), 
               match_fun = list(`==`, `>=`, `<=`))

  1. 通过 id 加入并保持数据在范围内-

  1. Join by id and keep data in range -

a. dplyr :

library(dplyr)
dataloggers %>%
  inner_join(data, by = 'id') %>%
  filter(datetime.x >= starttime & datetime.x <= datetime.y)

b.基数R:

subset(merge(dataloggers, data, by = 'id'), 
       datetime.x >= starttime & datetime.x <= datetime.y)

这篇关于根据日期/时间范围和匹配的ID过滤数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆