在整个数据框中查找多个字符串 [英] Find multiple strings in entire dataframe

查看:98
本文介绍了在整个数据框中查找多个字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用which函数在数据框中找到多个字符串。我正在尝试从在data.frame中查找字符串

I am trying to find multiple strings in my dataframe, using the which function. I am trying to extend the answer from Find string in data.frame

示例数据帧为:

df1 <-data.frame(animal = c(' a','b','c','两个','五个','c'),level = c('五个','一个','三个',30,'马','五个') ,length = c(10,20,30,'horse','eight','c'))

1      a  five     10
2      b   one     20
3      c three     30
4    two    30  horse
5   five horse  eight
6      c  five      c 

我得到正确的输出,例如
which(df1 == c,arr.ind = T); df1
给出:

on this dataframe when I apply the which function for one string, I get the correct output e.g. which(df1 =="c" , arr.ind = T);df1 gives:

  row col
[1,]   3   1
[2,]   6   1
[3,]   6   3

但是当我尝试搜索多个字符串时,我只会得到部分正确的输出,例如
which(df1 == c( c, horse,五个),arr.ind = T)

But when I try to search for multiple strings, I get only a partially correct output e.g. which(df1 ==c("c", "horse", "five") , arr.ind = T)

  row col
[1,]   5   2
[2,]   6   2

预期输出应为:

     row col
[1,]   3   1
[2,]   5   1
[3,]   6   1
[4,]   1   2
[5,]   5   2
[6,]   6   2
[7,]   4   3
[8,]   6   3

因此我的问题:


  1. 为什么用c( c,马,五个)不起作用?

  1. why does the solution with c("c", "horse", "five") not work?

我尝试过

which(df1 == c | df1 == horse | df1 == five,arr.ind = T)

这可以给我正确的输出,但是对于许多字符串来说,太长了,
如何使我的代码简洁?

that gives me the correct output, but for many strings is too lengthy, how can I make my code succinct?

推荐答案

我们可以使用 lapply 遍历向量,执行 == 减少并用 | 并用 which

We can loop through the vector with lapply, do the ==, Reduce it to single logical matrix with | and wrap with which

which(Reduce(`|`, lapply(c("c", "horse", "five"), `==`, df1)), arr.ind = TRUE)
#     row col
#[1,]   3   1
#[2,]   5   1
#[3,]   6   1
#[4,]   1   2
#[5,]   5   2
#[6,]   6   2
#[7,]   4   3
#[8,]   6   3






或者另一个选择是使用 mutate_all 遍历数据集的列并使用<$包装c $ c>其中


Or another option would be to loop through the columns of dataset with mutate_all and wrap with which

library(dplyr)
df1 %>%
  mutate_all(list(~ . %in% c("c", "horse", "five"))) %>%
  as.matrix %>% 
  which(., arr.ind = TRUE)

注意:在这里,如果OP想要做一个完整的字符串,我们不需要任何正则表达式或部分匹配项比赛。通常应该比进行部分匹配要快

NOTE: Here, we don't need any regex or partial matches if the OP wanted to do a full string match. It should be faster than doing any partial matches

通常,对于多个元素%in%会很有用,但是它仅适用于向量而不适用于data.frame

Usually, for multiple elements %in% would be useful, but, it works only on a vector and not a data.frame

这篇关于在整个数据框中查找多个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆