按两个向量提供的范围过滤,无需连接操作 [英] Filter by ranges supplied by two vectors, without a join operation

查看:20
本文介绍了按两个向量提供的范围过滤,无需连接操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望做到这一点:从一个数据框中获取日期并过滤另一个数据框中的数据 - R

除非不加入,因为我担心加入我的数据后,结果会太大而无法在过滤器之前放入内存.

except without joining, as I am afraid that after I join my data the result will be too big to fit in memory, prior to the filter.

这是示例数据:

tmp_df <- data.frame(a = 1:10)

我想做一个像这样的操作:

I wish to do an operation that looks like this:

lower_bound <- c(2, 4)
upper_bound <- c(2, 5)
tmp_df %>%
    filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately

我想要的结果是:

> tmp_df[(tmp_df$a <= 2 & tmp_df$a >= 2) | (tmp_df$a <= 5 & tmp_df$a >= 4), , drop = F] 
# one way to get indices to subset data frame, impractical for a long range vector
  a
2 2
4 4
5 5

我的内存要求问题(关于链接的连接解决方​​案)是 tmp_df 有更多行并且 lower_boundupper_bound向量有更多的条目.dplyr 解决方案,或者可以成为管道一部分的解决方案是首选.

My problem with memory requirements (with respect to the join solution linked) is when tmp_df has many more rows and the lower_bound and upper_bound vectors have many more entries. A dplyr solution, or a solution that can be part of pipe is preferred.

推荐答案

也许你可以借用 data.tableinrange 函数,它

Maybe you could borrow the inrange function from data.table, which

检查 x 中的每个值是否介于在下、上提供间隔.

checks whether each value in x is in between any of the intervals provided in lower,upper.

用法:

inrange(x,lower,upper,incbounds=TRUE)

library(dplyr); library(data.table)

tmp_df %>% filter(inrange(a, c(2,4), c(2,5)))
#  a
#1 2
#2 4
#3 5

这篇关于按两个向量提供的范围过滤,无需连接操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆