从 R 中的句子中提取一个词 [英] extracting a word from a sentence in R

查看:28
本文介绍了从 R 中的句子中提取一个词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提取后跟某个字母的单词.例如在这个例子中,我试图提取AB"后面的单词

I am trying to extract word that is followed by a certain letters. For instance in this example I am trying to extract words that follows 'AB'

x = c("So much fun - AB22148",                       
"AC33648 does whatever",                           
"I know -AB11025 Failed",                   
"Nothing stalled - AB16228",        
"Unable to do fdS2083D - Ab26604")

Num = character(0)
for (i in 1:length(x)) {
   y = unlist(strsplit(x[i]," "))
   Num[i] = grep("AB",y, perl = T, value = T, ignore.case = T)
  }

有几个问题(您可能已经知道了): 1. 如果 'AB' 不存在,那么我会收到一个错误,因为 Num 的长度不能为零.2. 如果我克服了这个问题(例如,用 AB 替换 AC),那么第 5 个条目会给我无法"而不是Ab26604".

There are couple of issues (as you could probably tell): 1. If 'AB' is not present then I get an error as Num cannot take zero length. 2. If I overcome that (for eg. by replacing AC with AB) then the 5th entry gives me 'unable' instead of "Ab26604".

我正在寻找的是: 1. 可以在没有循环的情况下完成吗(也许使用应用函数之一) 2. 如何考虑第 3 种和第 5 种情况的情况?[我想去掉-"号(我可以在下一步处理这个,但想知道是否可以同时完成)]

What I am looking for are: 1. Can it be done without the loop (perhaps using one of the apply function) 2. How to account for the scenario with 3rd and 5th case? [I will like to remove the '-'sign (I can take care of this in the next step but was wondering if it can be done simultaneously)]

   Num (current output)
  [1] "AB22148"  " "  "-AB11025" "AB16228"  "Unable" 

  Num (required output)
 [1] "AB22148"  " "  "AB11025" "AB16228"  "Ab26604" 

感谢大家的帮助.对此,我真的非常感激.如果您需要进一步说明,请告诉我

Thanks for all the help. I really appreciate it. Kindly let me know if you need additional clarification

推荐答案

您可以执行以下操作:

require(stringr)
str_extract(x, regex("AB[:alnum:]{5}", ignore_case = TRUE))

这给了你:

"AB22148" NA        "AB11025" "AB16228" "Ab26604"

如果你想用 " " 替换 NA 你可以这样做:

If you want to replace the NA by " " you can do:

str_replace_na(tmp, " ") # assuming tmp is the result from above

这给了你:

"AB22148" " "       "AB11025" "AB16228" "Ab26604"

这篇关于从 R 中的句子中提取一个词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆