dplyr:left_join,其中df A值位于df B值之间 [英] dplyr: left_join where df A value lies between df B values

查看:94
本文介绍了dplyr:left_join,其中df A值位于df B值之间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否可以使用dplyr或一些tidyverse程序包实现以下目标...

I'd like to know if it is possible to achieve the following using dplyr, or some tidyverse package...

上下文:我无法将数据放入允许使用 geom_rect 的结构中。有关动机,请参见此SO问题

Context: I am having trouble getting my data into a structure that will allow the use of geom_rect. See this SO question for the motivation.

library(tis)

# Prepare NBER recession start end dates.
recessions <- data.frame(start = as.Date(as.character(nberDates()[,"Start"]),"%Y%m%d"),
                    end= as.Date(as.character(nberDates()[,"End"]),"%Y%m%d"))

dt <- tibble(date=c(as.Date('1983-01-01'),as.Date('1990-10-15'), as.Date('1993-01-01')))

所需的输出:

date       start      end
1983-01-01 NA         NA
1990-10-15 1990-08-01 1991-03-31
1993-01-01 NA         NA

赞赏任何建议。

注意:先前的问题表明 sqldf 是一种方法。但是,这里的数据涉及日期,我的理解日期不是SQLite中的数据类型。

Note: Previous questions indicate that sqldf is one approach to take. However, the data here involves dates and my understanding date is not a data type in SQLite.

本着编写您希望拥有的代码的精神:

In the spirit of 'write the code you wish you had':

df <- dt %>%
      left_join(x=., y=recessions, date >= start & date <= end)


推荐答案

以下仅使用dplyr并产生所需的数据帧结果。
注意::在较大的数据集上,您可能会遇到内存问题,G。Grothendieck提出的 sqldf 将起作用。

The following uses only dplyr and produces the desired data frame result. Note: On larger datasets you will likely run into memory issues and the sqldf proposed by G. Grothendieck will work.

帽子提示:
@ nick-criswell,将我带到@ ian-gow,以此局部解决方案

# Build data frame of dates within the interval [start, end]
df1 <- dt %>% 
        mutate(dummy=TRUE) %>% 
        left_join(recessions %>% mutate(dummy=TRUE)) %>% 
        filter(date >= start & date <= end) %>% 
        select(-dummy) 

# Build data frame of all other dates with start=NA and end=NA
df2 <- dt %>% 
        mutate(dummy=TRUE) %>% 
        left_join(recessions %>% mutate(dummy=TRUE)) %>% 
        mutate(start=NA, end=NA) %>%
        unique() %>%
        select(-dummy) 
# Now merge the two.  Overwirte NA values with start and end dates
df <- df2 %>% 
      left_join(x=., y=df1, by="date") %>%
      mutate(date, start = ifelse(is.na(start.y), as.character(start.x), as.character(start.y)),end = ifelse(is.na(end.y), as.character(end.x), as.character(end.y))) %>%
      mutate(start=as.Date(start), end=as.Date(end) )

> df
# A tibble: 3 x 3
        date      start        end
      <date>     <date>     <date>
1 1983-01-01         NA         NA
2 1990-10-15 1990-08-01 1991-03-31
3 1993-01-01         NA         NA

这篇关于dplyr:left_join,其中df A值位于df B值之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆