基于使用grepl()的字符串列表的子集? [英] Subset based on list of strings using grepl()?

查看:100
本文介绍了基于使用grepl()的字符串列表的子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一些看起来很简单的东西。我想在几个不同的短语中使用grepl()命令或类似的东西在R中对数据框进行子集化,而不用构建循环。

例如

例如,我想为所有名为Bob或Mary的人提取所有行:

  ##示例数据框:
tmp = structure(list(Name = structure(c(6L,8L,9L,7L,2L,3L,10L,
1L,5L,4L),.Label = c(Alan,Bob ,bob史密斯,弗兰克,
约翰,玛丽安妮,玛丽珍妮,玛丽史密斯,波特,玛丽,
史密斯,BOB) ,年龄= c(31L,23L,23L,55L,
32L,36L,45L,12L,43L,46L),高度= 1:10),.Names = c(名称,
年龄,高度),class =data.frame,row.names = c(NA,-10L
))

tmp

#姓名年龄身高
#1玛丽安妮31 1
#2玛丽史密斯23 2
#3玛丽波特23 3
#4玛丽珍妮55 4
#5鲍勃32 5
#6鲍勃史密斯36 6
#7史密斯,BOB 45 7
#8艾伦12 8
#9约翰43 9
#10弗兰克46 10

##这不起作用
mynames = c('bob','mary')
tmp [grepl(mynames,tmp $ Name,ignore.case = T),]

任何想法都会有帮助!

解决方案

您可以将 mynames 使用正则表达式运算符 | 并使用 grep

  tmp [grep(paste(mynames,collapse ='|'),tmp $ Name,ignore.case = TRUE),] 

#姓名年龄身高
#1玛丽安妮31 1
#2玛丽史密斯23 2
#3玛丽波特23 3
#4玛丽珍妮55 4
#5 Bob 32 5
#6 bob smith 36 6
#7 smith,BOB 45 7


I'm looking to do something seemingly very simple. I would like to subset a data frame in R using the grepl() command -- or something like it -- on several different phrases without constructing a loop.

For example, I'd like to pull out all the rows for anyone named Bob or Mary:

## example data frame:
tmp = structure(list(Name = structure(c(6L, 8L, 9L, 7L, 2L, 3L, 10L, 
1L, 5L, 4L), .Label = c("Alan", "Bob", "bob smith", "Frank", 
"John", "Mary Anne", "mary jane", "Mary Smith", "Potter, Mary", 
"smith, BOB"), class = "factor"), Age = c(31L, 23L, 23L, 55L, 
32L, 36L, 45L, 12L, 43L, 46L), Height = 1:10), .Names = c("Name", 
"Age", "Height"), class = "data.frame", row.names = c(NA, -10L
))

tmp

#           Name Age Height
#1     Mary Anne  31      1
#2    Mary Smith  23      2
#3  Potter, Mary  23      3
#4     mary jane  55      4
#5           Bob  32      5
#6     bob smith  36      6
#7    smith, BOB  45      7
#8          Alan  12      8
#9          John  43      9
#10        Frank  46     10

## this doesn't work
mynames=c('bob','mary')
tmp[grepl(mynames,tmp$Name,ignore.case=T),]

Any ideas would be helpful!

解决方案

You can combine your mynames vector with the regular expression operator | and use grep.

tmp[grep(paste(mynames, collapse='|'), tmp$Name, ignore.case=TRUE),]

#           Name Age Height
# 1    Mary Anne  31      1
# 2   Mary Smith  23      2
# 3 Potter, Mary  23      3
# 4    mary jane  55      4
# 5          Bob  32      5
# 6    bob smith  36      6
# 7   smith, BOB  45      7

这篇关于基于使用grepl()的字符串列表的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆