在R中选择范围内的时间戳 [英] Selecting timestamps within range in R

查看:27
本文介绍了在R中选择范围内的时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有两个数据框.

I have two data frames in R.

df1 看起来像这样:

df1 looks like this:

id       time
1        2018-08-28 11:22:40
2        2018-08-28 11:35:10
3        2018-08-28 11:50:00
4        2018-08-28 11:55:30

df2 看起来像这样:

df2 looks like this:

start_time             end_time
2018-08-28 11:22:00    2018-08-28 11:22:50
2018-08-28 11:30:30    2018-08-28 11:34:10
2018-08-28 11:49:00    2018-08-28 11:52:20
2018-08-28 11:57:20    2018-08-28 11:59:40

我正在尝试从 df1 中选择位于 df2 中任何 start_time 和 end_time 对之间的行.在上面的例子中,我会留下:

I'm trying to select the rows from the df1 that fall between any of the start_time and end_time pairs in df2. In the example above that would leave me with:

id       time
1        2018-08-28 11:22:40
3        2018-08-28 11:50:00

这个问题类似于在此处发现的问题,但在 R 而不是 SQL 中.我如何实现这一目标?

This problem is similar to that found here but in R instead of SQL. How do I achieve this?

推荐答案

这是一个使用 fuzzyjoin

library(fuzzyjoin)
library(tidyverse)
fuzzy_left_join(
    df1 %>% mutate(time = as.POSIXct(time)),
    df2 %>% mutate(
        start_time = as.POSIXct(start_time),
        end_time = as.POSIXct(end_time)),
    by = c("time" = "start_time", "time" = "end_time"),
    match_fun = list(`>=`, `<=`)) %>%
    filter(!is.na(start_time)) %>%
    select(id, time)
#  id                time
#1  1 2018-08-28 11:22:40
#2  3 2018-08-28 11:50:00

说明:间隔连接df1df2(其中time >= start_time & time <= end_timecode>),然后只选择 start_time 中没有 NA 的行(因为那些是位于 start_time-end_time 间隔).

Explanation: Interval join df1 and df2 (where time >= start_time & time <= end_time), then only select rows with no NAs in start_time (as those are the entries that lie within the start_time-end_time interval).

df1 <- read.table(text =
    "id       time
1        '2018-08-28 11:22:40'
2        '2018-08-28 11:35:10'
3        '2018-08-28 11:50:00'
4        '2018-08-28 11:55:30'", header = T)

df2 <- read.table(text =
    "start_time             end_time
'2018-08-28 11:22:00'    '2018-08-28 11:22:50'
'2018-08-28 11:30:30'    '2018-08-28 11:34:10'
'2018-08-28 11:49:00'    '2018-08-28 11:52:20'
'2018-08-28 11:57:20'    '2018-08-28 11:59:40'", header = T)

这篇关于在R中选择范围内的时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆