如何在 tidyverse 框架中最有效地过滤另一个数据帧中的值? [英] How to most efficiently filter a dataframe conditionnaly of values in another one, in the tidyverse framework?

查看：56 发布时间：2021/6/23 19:10:52 r tidyverse purrr

本文介绍了如何在 tidyverse 框架中最有效地过滤另一个数据帧中的值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有 ID 列和一个 lubridate 时间间隔列的数据帧 df1，我想过滤(子采样)一个具有 ID 和 DateTime 列的数据帧 df2，以便只有带有 DateTime 的 df2 行符合相应的 ID 间隔在 df1 中保留.我想在 tidyverse 框架中这样做.
使用连接可以轻松完成(请参见下面的示例)，但我想知道是否有更直接的解决方案(可能基于 purrr)可以避免连接然后从第二个数据帧中删除时间间隔数据.谢谢.

I have a dataframe df1 with an ID column and a lubridate time interval column, and I want to filter (subsample) a dataframe df2, which has ID and DateTime columns, so that only df2 rows with DateTime fitting the corresponding ID interval in df1 are kept. I want to do so in a tidyverse framework.
It can easily be done using a join (see example below), but I would like to know whether there would be a more direct solution (maybe purrr-based) that would avoid joining and then removing the time-interval data from the second dataframe. Thanks.

这里发布的问题如果 x 的时间戳在 y 的时间间隔内，则合并两个数据帧接近这里提出的问题，但建议的解决方案与我开发的解决方案类似，而不是在 tidyverse 框架中.

The question posted here Merge two dataframes if timestamp of x is within time interval of y is close to the one asked here but proposed solution were similar to the one I developed and not in a tidyverse framework.

显示问题和我当前解决方案的最少代码:

A minimal code to show the problem and my current solution:

library(tibble)  
library(lubridate)

df1 <- tribble(
  ~ID, ~Date1, ~Date2,
  "ID1", "2018-04-16", "2018-06-14",
  "ID2", "2018-04-20", "2018-06-25") 
df1 <- mutate(df1,Interval = interval(ymd(Date1),ymd(Date2)))

df2 <- tribble(
  ~ID, ~DateTime,
  "ID1", "2018-04-12",
  "ID1", "2018-05-05",
  "ID2", "2018-04-23",
  "ID2", "2018-07-12")
df2 <- mutate(df2,DateTime=ymd(DateTime))

df1 看起来像这样

> df1
# A tibble: 2 x 4
  ID    Date1      Date2      Interval                      
  <chr> <chr>      <chr>      <S4: Interval>                
1 ID1   2018-04-16 2018-06-14 2018-04-16 UTC--2018-06-14 UTC
2 ID2   2018-04-20 2018-06-25 2018-04-20 UTC--2018-06-25 UTC

和 df2 像这样:

> df2
# A tibble: 4 x 2
  ID    DateTime  
  <chr> <date>    
1 ID1   2018-04-12
2 ID1   2018-05-05
3 ID2   2018-04-23
4 ID2   2018-07-12

在 df2 中，ID1 的第二条记录不在 df1 中的 ID1 区间内.ID2 的第二条记录也不在 df1 中的 ID2 区间内.

In df2, the second record for ID1 is not within the ID1 interval in df1. The second record for ID2 is also not within the ID2 interval in df1.

我当前基于加入和删除加入列的解决方案如下:

My current solution based on joining and the removing the joined column follows:

df_out <- df2 %>%
  left_join(.,df1,by="ID") %>%
  filter(.,DateTime %within% Interval) %>%
  select(.,-Interval)

> df_out
# A tibble: 2 x 4
  ID    DateTime   Date1      Date2     
  <chr> <date>     <chr>      <chr>     
1 ID1   2018-05-05 2018-04-16 2018-06-14
2 ID2   2018-04-23 2018-04-20 2018-06-25

我觉得应该存在一个 tidyverse 替代方案，可以避免加入然后删除 Interval 列.

I have the feeling a tidyverse alternative that would avoid joining and then removing the Interval column should exist.

如何在 tidyverse 框架中最有效地过滤另一个数据帧中的值? [英] How to most efficiently filter a dataframe conditionnaly of values in another one, in the tidyverse framework?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 tidyverse 框架中最有效地过滤另一个数据帧中的值? [英] How to most efficiently filter a dataframe conditionnaly of values in another one, in the tidyverse framework?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭