在整个数据框中查找多个字符串 [英] Find multiple strings in entire dataframe
问题描述
我正在尝试使用which函数在数据框中找到多个字符串。我正在尝试从在data.frame中查找字符串
I am trying to find multiple strings in my dataframe, using the which function. I am trying to extend the answer from Find string in data.frame
示例数据帧为:
df1 <-data.frame(animal = c(' a','b','c','两个','五个','c'),level = c('五个','一个','三个',30,'马','五个') ,length = c(10,20,30,'horse','eight','c'))
1 a five 10
2 b one 20
3 c three 30
4 two 30 horse
5 five horse eight
6 c five c
我得到正确的输出,例如
which(df1 == c,arr.ind = T); df1
给出:
on this dataframe when I apply the which function for one string, I get the correct output e.g.
which(df1 =="c" , arr.ind = T);df1
gives:
row col
[1,] 3 1
[2,] 6 1
[3,] 6 3
但是当我尝试搜索多个字符串时,我只会得到部分正确的输出,例如
which(df1 == c( c, horse,五个),arr.ind = T)
But when I try to search for multiple strings, I get only a partially correct output e.g.
which(df1 ==c("c", "horse", "five") , arr.ind = T)
row col
[1,] 5 2
[2,] 6 2
预期输出应为:
row col
[1,] 3 1
[2,] 5 1
[3,] 6 1
[4,] 1 2
[5,] 5 2
[6,] 6 2
[7,] 4 3
[8,] 6 3
因此我的问题:
-
为什么用c( c,马,五个)不起作用?
why does the solution with c("c", "horse", "five") not work?
我尝试过
which(df1 == c | df1 == horse | df1 == five,arr.ind = T)
这可以给我正确的输出,但是对于许多字符串来说,太长了,
如何使我的代码简洁?
that gives me the correct output, but for many strings is too lengthy, how can I make my code succinct?
推荐答案
我们可以使用 lapply
遍历向量,执行 ==
,减少
并用 | $ c $还原为单个逻辑矩阵c>并用
which
We can loop through the vector with lapply
, do the ==
, Reduce
it to single logical matrix with |
and wrap with which
which(Reduce(`|`, lapply(c("c", "horse", "five"), `==`, df1)), arr.ind = TRUE)
# row col
#[1,] 3 1
#[2,] 5 1
#[3,] 6 1
#[4,] 1 2
#[5,] 5 2
#[6,] 6 2
#[7,] 4 3
#[8,] 6 3
或者另一个选择是使用 mutate_all
遍历数据集的列并使用<$包装c $ c>其中
Or another option would be to loop through the columns of dataset with mutate_all
and wrap with which
library(dplyr)
df1 %>%
mutate_all(list(~ . %in% c("c", "horse", "five"))) %>%
as.matrix %>%
which(., arr.ind = TRUE)
注意:在这里,如果OP想要做一个完整的字符串,我们不需要任何正则表达式或部分匹配项比赛。通常应该比进行部分匹配要快
NOTE: Here, we don't need any regex or partial matches if the OP wanted to do a full string match. It should be faster than doing any partial matches
通常,对于多个元素%in%
会很有用,但是它仅适用于向量而不适用于data.frame
Usually, for multiple elements %in%
would be useful, but, it works only on a vector and not a data.frame
这篇关于在整个数据框中查找多个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!