在 R 中按日期加入模糊匹配 [英] Join with fuzzy matching by date in R
本文介绍了在 R 中按日期加入模糊匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个数据框,我想按日期加入它们
I have two data frames that I'd like to join them by the dates
df1 <-
data.frame(
day = seq(ymd("2020-01-01"), ymd("2020-01-14"), by = "1 day"),
key = rep(c("green", "blue"), 7),
value_x = sample(1:100, 14)
) %>%
as_tibble()
df2 <-
data.frame(
day = seq(ymd("2020-01-01"), ymd("2020-01-12"), by = "3 days"),
key = rep(c("green", "blue"), 2),
value_y = c(2, 4, 6, 8)
) %>%
as_tibble()
我希望输出是这样的
# A tibble: 14 x 3
day key value_x value_y
<date> <fct> <int> <int>
1 2020-01-01 green 91 2
2 2020-01-02 blue 28 NA
3 2020-01-03 green 75 2
4 2020-01-04 blue 14 4
5 2020-01-05 green 3 2
6 2020-01-06 blue 27 4
7 2020-01-07 green 15 6
8 2020-01-08 blue 7 4
9 2020-01-09 green 1 6
10 2020-01-10 blue 10 8
11 2020-01-11 green 9 6
12 2020-01-12 blue 76 8
13 2020-01-13 green 31 6
14 2020-01-14 blue 62 8
我试着做这段代码
merge(df1, df2, by = c("day", "key"), all.x = TRUE)
我希望左侧表中的日期与 Y 表中具有值的最近一天连接.如果没有值,那么应该是NA.
I'd like the day in the left table to join to the most recent day in the Y table that has a value. If there is no value, then it should be NA.
编辑 --
并非 df2 中的所有日期都会出现在 df1 中,但它们确实具有公共 ID.这是一个例子-
Not all the dates in df2 will appear in df1 while they do have a common ID. This is an example-
df1
day id key
1 2020-01-08 A green
2 2020-01-10 A green
3 2020-02-24 A blue
4 2020-03-24 A green
df2
day id value
1 2020-01-03 A 2
2 2020-01-07 A 4
3 2020-01-22 A 4
4 2020-03-24 A 6
desired output
day id key value
1 2020-01-08 A green 4
2 2020-01-10 A green 4
3 2020-02-24 A blue 4
4 2020-03-24 A green 6
推荐答案
合并后,可以根据key
和day
排列数据> 和 fill
使用最新的非 NA 值.
After merging, you can arrange
the data based on key
and day
and fill
with the most recent non-NA value.
library(dplyr)
merge(df1, df2, by = c('day', 'key'), all.x = TRUE) %>%
arrange(key, day) %>%
group_by(key) %>%
tidyr::fill(value_y) %>%
arrange(day)
# day key value_x value_y
#1 2020-01-01 green 40 2
#2 2020-01-02 blue 45 NA
#3 2020-01-03 green 54 2
#4 2020-01-04 blue 11 4
#5 2020-01-05 green 12 2
#6 2020-01-06 blue 7 4
#7 2020-01-07 green 72 6
#8 2020-01-08 blue 76 4
#9 2020-01-09 green 52 6
#10 2020-01-10 blue 32 8
#11 2020-01-11 green 69 6
#12 2020-01-12 blue 10 8
#13 2020-01-13 green 63 6
#14 2020-01-14 blue 84 8
对于更新的数据,您可以使用以下内容:
For the updated data you can use the following :
df1 %>%
left_join(df2, by = 'id') %>%
mutate(diff = day.x - day.y) %>%
group_by(id, key, day.x) %>%
filter(diff == min(diff[diff >= 0])) %>%
arrange(day.x) %>%
select(day = day.x, id, key, value)
# day id key value
# <date> <chr> <chr> <int>
#1 2020-01-08 A green 4
#2 2020-01-10 A green 4
#3 2020-02-24 A blue 4
#4 2020-03-24 A green 6
这篇关于在 R 中按日期加入模糊匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文