如何通过多个灵活的标准过滤列 [英] How to filter a column by multiple, flexible criteria

查看:94
本文介绍了如何通过多个灵活的标准过滤列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个聚合数据框的函数,它需要适用于各种各样的数据集。该功能的一个步骤是dplyr的过滤器功能,用于从数据中仅选择与当前任务相关的广告活动类型。因为我需要灵活的功能,所以我需要ad_campaign_types作为输入,但是这使得过滤有点多毛,如下所示:

  aggregate_data<  -  function(ad_campaign_types){
raw_data%>%
filter(ad_campaign_type == ad_campaign_types) - > agg_data
agg_data
}
new_data< - aggregate_data(ad_campaign_types = c(campaign_A,campaign_B,campaign_C))

我认为上述方法可行,但在运行时,奇怪的是它只返回过滤数据集的一小部分。有没有更好的方法来做到这一点?



可替换代码的另一个小例子:

 < (c)a,a,a,b,b,c,c,c,d,d )
收入<-c(1,2,3,4,5,6,7,8,9,10)
data < - as.data.frame(cbind(ad_types,revenue ))

#现在,过滤只选择广告类型a,b和d,
#这应该只剩下7个值
new_data < - filter(data,ad_types == c(a,b,d))
nrow(new_data)
[1] 3
%in%函数:
$ b $ pre $ filter(data,ad_types%in%c(a,b,d))

您也可以使用不在条件:

 过滤器(数据,!(ad_types%在%c(a,b,d)))

但是请注意,%在%的行为与 == :

 > c(2,NA)== 2 
[1] TRUE NA
> c(2,NA)%in%2
[1] TRUE FALSE

其中一个比其他更直观,但你必须记住差异。



至于使用多种不同的标准,简单地使用标准链和/或语句:

 过滤器(mtcars,cyl> 2&wt< 2.5&齿轮== 4)


I'm writing a function to aggregate a dataframe, and it needs to be generally applicable to a wide variety of datasets. One step in this function is dplyr's filter function, used to select from the data only the ad campaign types relevant to the task at hand. Since I need the function to be flexible, I want ad_campaign_types as an input, but this makes filtering kind of hairy, as so:

aggregate_data <- function(ad_campaign_types) {
  raw_data %>%
    filter(ad_campaign_type == ad_campaign_types) -> agg_data
  agg_data
}
new_data <- aggregate_data(ad_campaign_types = c("campaign_A", "campaign_B", "campaign_C"))

I would think the above would work, but while it runs, oddly enough it only returns only a small fraction of what the filtered dataset should be. Is there a better way to do this?

Another tiny example of replaceable code:

ad_types <- c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d")
revenue <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
data <- as.data.frame(cbind(ad_types, revenue))

# Now, filtering to select only ad types "a", "b", and "d",
# which should leave us with only 7 values
new_data <- filter(data, ad_types == c("a", "b", "d"))
nrow(new_data)
[1] 3

解决方案

For multiple criteria use %in% function:

filter(data, ad_types %in% c("a", "b", "d"))

you can also use "not in" criterion:

filter(data, !(ad_types %in% c("a", "b", "d")))

However notice that %in%'s behavior is a little bit different than ==:

> c(2, NA) == 2
[1] TRUE   NA
> c(2, NA) %in% 2
[1]  TRUE FALSE

some find one of those more intuitive than other, but you have to remember about the difference.

As for using multiple different criteria simply use chains of criteria with and/or statements:

filter(mtcars, cyl > 2 & wt < 2.5 & gear == 4)

这篇关于如何通过多个灵活的标准过滤列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆