按两个向量提供的范围过滤,而无需进行联接操作 [英] Filter by ranges supplied by two vectors, without a join operation

查看:110
本文介绍了按两个向量提供的范围过滤,而无需进行联接操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望做到这一点:从一个数据框中获取日期,然后在另一个数据框中过滤数据-R

,除非未加入,否则我担心在加入数据后结果

except without joining, as I am afraid that after I join my data the result will be too big to fit in memory, prior to the filter.

以下是示例数据:

tmp_df <- data.frame(a = 1:10)

我希望执行以下操作:

lower_bound <- c(2, 4)
upper_bound <- c(2, 5)
tmp_df %>%
    filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately

我想要的结果是:

> tmp_df[(tmp_df$a <= 2 & tmp_df$a >= 2) | (tmp_df$a <= 5 & tmp_df$a >= 4), , drop = F] 
# one way to get indices to subset data frame, impractical for a long range vector
  a
2 2
4 4
5 5

我的记忆问题要求(相对于链接的联接解决方案)是 tmp_df 有更多行并且 lower_bound upper_bound 向量具有更多的条目。首选 dplyr 解决方案,或者可以成为管道一部分的解决方案。

My problem with memory requirements (with respect to the join solution linked) is when tmp_df has many more rows and the lower_bound and upper_bound vectors have many more entries. A dplyr solution, or a solution that can be part of pipe is preferred.

推荐答案

也许您可以从 data.table 中借用 inrange 函数,其中

Maybe you could borrow the inrange function from data.table, which


检查x中的每个值是否在下,上提供的
间隔之间。

checks whether each value in x is in between any of the intervals provided in lower,upper.

用法:

inrange(x,lower,upper,incbounds = TRUE)

library(dplyr); library(data.table)

tmp_df %>% filter(inrange(a, c(2,4), c(2,5)))
#  a
#1 2
#2 4
#3 5

这篇关于按两个向量提供的范围过滤,而无需进行联接操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆