dplyr:left_join,其中df A值位于df B值之间 [英] dplyr: left_join where df A value lies between df B values
问题描述
我想知道是否可以使用dplyr或一些tidyverse程序包实现以下目标...
I'd like to know if it is possible to achieve the following using dplyr, or some tidyverse package...
上下文:我无法将数据放入允许使用 geom_rect
的结构中。有关动机,请参见此SO问题。
Context: I am having trouble getting my data into a structure that will allow the use of geom_rect
. See this SO question for the motivation.
library(tis)
# Prepare NBER recession start end dates.
recessions <- data.frame(start = as.Date(as.character(nberDates()[,"Start"]),"%Y%m%d"),
end= as.Date(as.character(nberDates()[,"End"]),"%Y%m%d"))
dt <- tibble(date=c(as.Date('1983-01-01'),as.Date('1990-10-15'), as.Date('1993-01-01')))
所需的输出:
date start end
1983-01-01 NA NA
1990-10-15 1990-08-01 1991-03-31
1993-01-01 NA NA
赞赏任何建议。
注意:先前的问题表明 sqldf
是一种方法。但是,这里的数据涉及日期,我的理解日期不是SQLite中的数据类型。
Note: Previous questions indicate that sqldf
is one approach to take. However, the data here involves dates and my understanding date is not a data type in SQLite.
本着编写您希望拥有的代码的精神:
In the spirit of 'write the code you wish you had':
df <- dt %>%
left_join(x=., y=recessions, date >= start & date <= end)
推荐答案
以下仅使用dplyr并产生所需的数据帧结果。
注意::在较大的数据集上,您可能会遇到内存问题,G。Grothendieck提出的 sqldf
将起作用。
The following uses only dplyr and produces the desired data frame result.
Note: On larger datasets you will likely run into memory issues and the sqldf
proposed by G. Grothendieck will work.
帽子提示:
@ nick-criswell,将我带到@ ian-gow,以此局部解决方案
# Build data frame of dates within the interval [start, end]
df1 <- dt %>%
mutate(dummy=TRUE) %>%
left_join(recessions %>% mutate(dummy=TRUE)) %>%
filter(date >= start & date <= end) %>%
select(-dummy)
# Build data frame of all other dates with start=NA and end=NA
df2 <- dt %>%
mutate(dummy=TRUE) %>%
left_join(recessions %>% mutate(dummy=TRUE)) %>%
mutate(start=NA, end=NA) %>%
unique() %>%
select(-dummy)
# Now merge the two. Overwirte NA values with start and end dates
df <- df2 %>%
left_join(x=., y=df1, by="date") %>%
mutate(date, start = ifelse(is.na(start.y), as.character(start.x), as.character(start.y)),end = ifelse(is.na(end.y), as.character(end.x), as.character(end.y))) %>%
mutate(start=as.Date(start), end=as.Date(end) )
> df
# A tibble: 3 x 3
date start end
<date> <date> <date>
1 1983-01-01 NA NA
2 1990-10-15 1990-08-01 1991-03-31
3 1993-01-01 NA NA
这篇关于dplyr:left_join,其中df A值位于df B值之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!