使用多个匹配对 R 中的多个列进行子集 [英] Subset multiple columns in R with multiple matches
问题描述
我想做一个与此线程类似的事情:子集R 中的多列 - 更优雅的代码?
I want to do a similar thing as in this thread: Subset multiple columns in R - more elegant code?
我有这样的数据:
df=data.frame(x=1:4,Col1=c("A","A","C","B"),Col2=c("A","B","B","A"),Col3=c("A","C","C","A"))
criteria="A"
我想要做的是subset
criteria
在至少两列中满足的数据,也就是at中的string
三列中至少有两列是A
.在上述情况下,subset
将是数据帧 df
的第一行和最后一行.
What I want to do is to subset
the data where criteria
is meet in at least two columns, that is the string
in at least two of the three columns is A
. In the case above, the subset
would be the first and last row of the data frame df
.
推荐答案
您可以使用 rowSums
:
df[rowSums(df[-1] == criteria) >= 2, ]
# x Col1 Col2 Col3
#1 1 A A A
#4 4 B A A
如果 criteria
的长度是 >1 你不能直接使用 ==
在这种情况下使用 sapply
和 %in%
.
If criteria
is of length > 1 you cannot use ==
directly in which case use sapply
with %in%
.
df[rowSums(sapply(df[-1], `%in%`, criteria)) >= 2, ]
在 dplyr
中,您可以将 filter
与 rowwise
一起使用:
In dplyr
you can use filter
with rowwise
:
library(dplyr)
df %>%
rowwise() %>%
filter(sum(c_across(starts_with('col')) %in% criteria) >= 2)
这篇关于使用多个匹配对 R 中的多个列进行子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!