部分字符串匹配r [英] partial string matching r

查看:243
本文介绍了部分字符串匹配r的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框架

  d< -data.frame(name = c(brown cat,blue cat 大狮子,高虎虎,
黑豹,短猫,红鸟,
短鸟,大鹰,麻雀 ,
dog fish,head dog,brown yorkie,
lab short bulldog),label = 1:14)
pre>

我想搜索名称列,如果单词
cat,lion,tiger,panther出现,我想将字符串 feline 列和相应的行种类



如果单词bird,eagle ,sparrow出现,我想将字符串 avian 分配给一个新的列和相应的行 code>。



如果单词dog,yorkie,bulldog我想分配字符串 canine 到一个新的列和相应的行种类



理想情况下,我将它存储在列表或某事类似于我可以保持在脚本的开头,因为当种类的新变种出现在名称类别中时,很容易获得更新的资格,作为一个猫科动物c $ c>, avian canine



这个问题几乎在这里回答(如何在数据框中创建新的列,基于部分字符串匹配R中的其他列),但不解决此问题中存在的多重名称扭曲

解决方案

可能有一个比这更优雅的解决方案,但您可以使用 grep 指定替代匹配。

  d [grep(cat | lion | tiger | panther  ,d $ name),species]<  - feline
d [grep(bird | eagle | sparrow,d $ name),species]< - avian
d [grep(dog | yorkie,d $ name),species)< - canine





你可能想添加 ignore.case = TRUE 到grep。



输出:

 #名称标签种类
#1棕色猫1猫猫
#2蓝猫2猫猫
#3大狮子3猫猫
#4高虎4猫猫
#5黑豹5猫猫
#6短猫6猫猫
#7红鸟7禽鸟
#8短鸟塞8禽鸟
#9大鹰9禽鸟
#10坏麻雀10禽鸟
#11狗鱼11犬
#12头狗12犬
#13 brown yorkie 13 cani ne
#14实验室短牛头犬14 canine


I have a dataframe

d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger", 
                     "black panther", "short cat", "red bird",
                     "short bird stuffed", "big eagle", "bad sparrow",
                     "dog fish", "head dog", "brown yorkie",
                     "lab short bulldog"), label=1:14)

I'd like to search the name column and if the words "cat","lion","tiger","panther" appear, I want to assign the character string feline to a new column and corresponding row species.

if the words "bird","eagle","sparrow" appear, I want to assign the character string avian to a new column and corresponding row species.

if the words "dog","yorkie","bulldog" appear, I want to assign the character string canine to a new column and corresponding row species.

Ideally, I'd store this in a list or something similar that I can keep at the beginning of the script, because as new variants of the species show up in the name category, it would be nice to have easy access to update what qualifies as a feline, avian and canine.

This question is almost answered here (How to create new column in dataframe based on partial string matching other column in R), but it doesn't address the multiple name twist that is present in this problem

解决方案

There may be a more elegant solution than this, but you could use grep with | to specify alternative matches.

d[grep("cat|lion|tiger|panther", d$name), "species"] <- "feline"
d[grep("bird|eagle|sparrow", d$name), "species"] <- "avian"
d[grep("dog|yorkie", d$name), "species"] <- "canine"

I've assumed you meant "avian", and left out "bulldog" since it contains "dog".

You might want to add ignore.case = TRUE to the grep.

output:

#                 name label species
#1           brown cat     1  feline
#2            blue cat     2  feline
#3            big lion     3  feline
#4          tall tiger     4  feline
#5       black panther     5  feline
#6           short cat     6  feline
#7            red bird     7   avian
#8  short bird stuffed     8   avian
#9           big eagle     9   avian
#10        bad sparrow    10   avian
#11           dog fish    11  canine
#12           head dog    12  canine
#13       brown yorkie    13  canine
#14  lab short bulldog    14  canine

这篇关于部分字符串匹配r的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆