使用OR更好地用dplyr过滤数据框? [英] Better way to filter a data frame with dplyr using OR?

查看:82
本文介绍了使用OR更好地用dplyr过滤数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有一个数据框,列有 subject1 subject2 (其中包含国会图书馆议题)。我想通过测试主题是否与批准的列表匹配来过滤数据框。比方说,我有这个数据框。

  data<  -  data.frame(
subject1 = c(History,Biology,Physics ,数字人文),
subject2 = c(化学,宗教,化学,宗教)

pre>

假设这是已批准科目的列表。

 条件<  -  c(历史,宗教)

我想做的是过滤由主题1或主题2:

 子集<  - 过滤器(data,subject1%in%condition | subject2%in%condition) 

根据需要,将原始数据框中的项目1,2和4返回。 >

这是使用而不是逻辑过滤多个字段的最佳方法?似乎必须有一个更好,更习惯的方式,但我不知道是什么。



也许一个更通用的方式来问问题是说如果我将subject1和subject2组合在一起,是否有一种方法来测试一个向量中的任何值是否与另一个向量中的任何值相匹配。我想写一些东西:

  subset<  -  filter(data,c(subject1,subject2)%in%条件)


解决方案

我不知道这种方法是否更好。至少你不必写列名:

  library(dplyr)
filter(data,rowSums (数据,%in%,条件)))
#subject1 subject2
#1历史化学
#2生物宗教
#3数字人文宗教


I have a data frame in R with columns subject1 and subject2 (which contain Library of Congress subject headings). I'd like to filter the data frame by testing whether the subjects match an approved list. Say, for example, that I have this data frame.

data <- data.frame(
  subject1 = c("History", "Biology", "Physics", "Digital Humanities"),
  subject2 = c("Chemistry", "Religion", "Chemistry", "Religion")
)

And suppose this is the list of approved subjects.

condition <- c("History", "Religion")

What I want to do is filter by either subject1 or subject2:

subset <- filter(data, subject1 %in% condition | subject2 %in% condition)

That returns items 1, 2, and 4 from the original data frame, as desired.

Is that the best way to filter by multiple fields using or rather than and logic? It seems like there must be a better, more idiomatic way, but I don't know what it is.

Maybe a more generic way to ask the question is to say, if I combine subject1 and subject2, is there a way of testing if any value in one vector matches any value in another vector. I'd like to write something like:

subset <- filter(data, c(subject1, subject2) %in% condition)

解决方案

I'm not sure whether this approach is better. At least you don't have to write the column names:

library(dplyr)
filter(data, rowSums(sapply(data, "%in%", condition)))
#             subject1  subject2
# 1            History Chemistry
# 2            Biology  Religion
# 3 Digital Humanities  Religion

这篇关于使用OR更好地用dplyr过滤数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆