按两个向量提供的范围过滤,而无需进行联接操作 [英] Filter by ranges supplied by two vectors, without a join operation
问题描述
我希望做到这一点:从一个数据框中获取日期,然后在另一个数据框中过滤数据-R
,除非未加入,否则我担心在加入数据后结果
except without joining, as I am afraid that after I join my data the result will be too big to fit in memory, prior to the filter.
以下是示例数据:
tmp_df <- data.frame(a = 1:10)
我希望执行以下操作:
lower_bound <- c(2, 4)
upper_bound <- c(2, 5)
tmp_df %>%
filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately
我想要的结果是:
> tmp_df[(tmp_df$a <= 2 & tmp_df$a >= 2) | (tmp_df$a <= 5 & tmp_df$a >= 4), , drop = F]
# one way to get indices to subset data frame, impractical for a long range vector
a
2 2
4 4
5 5
我的记忆问题要求(相对于链接的联接解决方案)是 tmp_df
有更多行并且 lower_bound
和 upper_bound
向量具有更多的条目。首选 dplyr
解决方案,或者可以成为管道一部分的解决方案。
My problem with memory requirements (with respect to the join solution linked) is when tmp_df
has many more rows and the lower_bound
and upper_bound
vectors have many more entries. A dplyr
solution, or a solution that can be part of pipe is preferred.
推荐答案
也许您可以从 data.table
中借用 inrange
函数,其中
Maybe you could borrow the inrange
function from data.table
, which
检查x中的每个值是否在下,上提供的
间隔之间。
checks whether each value in x is in between any of the intervals provided in lower,upper.
用法:
inrange(x,lower,upper,incbounds = TRUE)
library(dplyr); library(data.table)
tmp_df %>% filter(inrange(a, c(2,4), c(2,5)))
# a
#1 2
#2 4
#3 5
这篇关于按两个向量提供的范围过滤,而无需进行联接操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!