查找数据框之间的最近的前后日期 [英] Find nearest preceding and following dates between data frames

查看:37
本文介绍了查找数据框之间的最近的前后日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两个数据帧:

df1 <- data.frame(ID = c("A","A","B","B","C","D","D","D","E"),
             Date = as.POSIXct(c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00","2018-04-17 14:21:00","2018-04-18 09:56:00","2018-05-02 07:49:00")))

df2 <- data.frame(ID = c("A","A","A","B","C","D","D","D","D","D","E"),
              Date = as.POSIXct(c("2018-04-10 07:11:00","2018-04-11 18:59:00","2018-04-12 12:37:00","2018-04-15 01:43:00","2018-04-21 09:52:00","2018-04-15 20:25:00","2018-04-17 12:33:00","2018-04-17 14:21:00","2018-04-18 10:59:00","2018-04-20 14:11:00","2018-05-01 09:50:00")))

对于df1,我想做两件事:首先,我想通过ID查找df2中最近的日期.其次,我想再次从df2中找到ID以后的最近日期,而无需重复值.在这两种情况下,我都不希望在df1中重复来自df2的日期.

For df1, I would like to do 2 things: First, I want to find the nearest preceding date, by ID, from df2. Second, I want to find the nearest following date, by ID, from df2, again without repeating values. In both cases, I do not want dates from df2 to be repeated in df1.

使用data.table包中的roll = Inf功能,我可以在前面的日期中按ID进行合并.

Using the roll = Inf feature from the data.table package I am able to merge in the preceding dates by ID.

setDT(df1)
setDT(df2)

setkey(df1, ID, Date)
setkey(df2, ID, Date)[, PrecedingDate:=Date]

result <- df2[df1, roll=Inf]

我不确定如何将最近的日期从df2拉入df1,以及如何确保不重复日期.

I'm unsure of how I can pull the nearest following date from df2 into df1, and how I can ensure that dates are not repeated.

结果应如下:

result <- data.frame(ID = c("A","A","B","B","C","D","D","D","E"),
                     Date = as.POSIXct(c("2018-04-12 08:56:00","2018-04-13 11:03:00","2018-04-14 14:30:00","2018-04-15 03:10:00","2018-04-16 07:28:00","2018-04-17 11:17:00","2018-04-17 14:21:00","2018-04-18 09:56:00","2018-05-02 07:49:00")),
                     PrecedingDate = as.POSIXct(c("2018-04-11 18:59:00","2018-04-12 02:37:00",NA,"2018-04-15 01:43:00",NA,"2018-04-15 20:25:00","2018-04-17 14:21:00",NA,"2018-05-01 09:50:00")),
                     FollowingDate = as.POSIXct(c("2018-04-12 02:37:00",NA,"2018-04-15 01:43:00",NA,"2018-04-21 09:52:00","2018-04-17 12:33:00","2018-04-17 14:21:00","2018-04-18 10:59:00",NA)))

这里的任何帮助将是不胜感激的.

Any help here would be most appreciated.

推荐答案

以下是使用 dplyr 的解决方案.您可能会收到有关 min max 函数的一些警告,但可以放心地忽略或隐藏它们.

Here's a solution using dplyr. You might get some warnings for min max functions but you can safely ignore or suppress them.

library(dplyr)

closest_to_zero <- function(x) {
  neg <- which(x == max(x[x < 0]))
  pos <- which(x == min(x[x > 0]))
  c(previous = neg, following = pos)
}

result <- left_join(df1, df2, by = "ID") %>%
  group_by(ID, Date.x) %>%
  mutate(
    time_diff = Date.y - Date.x,
    Preceding = Date.y[closest_to_zero(time_diff)["previous"]],
    Following = Date.y[closest_to_zero(time_diff)["following"]]
  ) %>%
  distinct(ID, Date.x, Preceding, Following)

# A tibble: 9 x 4
# Groups:   ID, Date.x [9]
  ID    Date.x              Preceding           Following          
  <fct> <dttm>              <dttm>              <dttm>             
1 A     2018-04-12 08:56:00 2018-04-11 18:59:00 2018-04-12 12:37:00
2 A     2018-04-13 11:03:00 2018-04-12 12:37:00 NA                 
3 B     2018-04-14 14:30:00 NA                  2018-04-15 01:43:00
4 B     2018-04-15 03:10:00 2018-04-15 01:43:00 NA                 
5 C     2018-04-16 07:28:00 NA                  2018-04-21 09:52:00
6 D     2018-04-17 11:17:00 2018-04-15 20:25:00 2018-04-17 12:33:00
7 D     2018-04-17 14:21:00 2018-04-17 12:33:00 2018-04-18 10:59:00
8 D     2018-04-18 09:56:00 2018-04-17 14:21:00 2018-04-18 10:59:00
9 E     2018-05-02 07:49:00 2018-05-01 09:50:00 NA                 

这篇关于查找数据框之间的最近的前后日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆