从字符串向量数据中提取一串单词 [英] Extracting a string of words from a string vector data

查看:30
本文介绍了从字符串向量数据中提取一串单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串向量数据,如下所示

I have a string vector data as shown below

Data
Posted by Mohit Garg on May 7, 2016
Posted by Dr. Lokesh Garg on April 8, 2018
Posted by Lokesh.G.S  on June 11, 2001
Posted by Mohit.G.S. on July 23, 2005
Posted by Dr.Mohit G Kumar Saha on August 2, 2019

我使用了 str_extract() 函数作为

I have used str_extract() function as

str_extract(Data, "Posted by \\w+. \\w+ \\w+")

它生成的输出为

[1] "Posted by Mohit Garg on"   "Posted by Dr. Lokesh Garg" NA                         
[4] NA                          NA  

我希望输出应该像

[1] "Posted by Mohit Garg on"   "Posted by Dr. Lokesh Garg"  "Posted by Lokesh.G.S"                       
[4] "Posted by Mohit.G.S."                     "Posted by Dr.Mohit G Kumar Saha"

推荐答案

或许你可以试试:

stringr::str_extract(df$Data, "Posted by .+?(?=\\s+on)")

#[1] "Posted by Mohit Garg" "Posted by Dr. Lokesh Garg"  "Posted by Lokesh.G.S"
#[4] "Posted by Mohit.G.S." "Posted by Dr.Mohit G Kumar Saha"

这会提取从 "Posted by""on" 的所有内容,不包括 "on".

This extracts everything from "Posted by" till "on" excluding "on".

在基础 R 中相同:

sub(".*(Posted by .+?)(?=\\s+on).*", '\\1', df$Data, perl = TRUE) 

数据

df <- structure(list(Data = c("Posted by Mohit Garg on May 7, 2016", 
"Posted by Dr. Lokesh Garg on April 8, 2018", "Posted by Lokesh.G.S  on June 11, 2001", 
"Posted by Mohit.G.S. on July 23, 2005", "Posted by Dr.Mohit G Kumar Saha on August 2, 2019"
)), class = "data.frame", row.names = c(NA, -5L))

这篇关于从字符串向量数据中提取一串单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆