检测字符串变量中的单词列表,并将匹配的单词提取到数据框中的新变量中 [英] Detect a list of words in a string variable and extract matched words to a new variable in data frame

查看:69
本文介绍了检测字符串变量中的单词列表,并将匹配的单词提取到数据框中的新变量中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个变量数据帧,其中一个是字符向量。 MyVector中的每一行都包含一个仅包含一个名称的字符串(即 Pete)。名称在字符串中的位置可以不同。我想创建与列表中的名称与字符串中的名称相匹配的代码,并将该名称提取到数据框中的新变量中。
如果名称在向量 MyVector中始终位于相同位置,我将创建一个新变量作为MyVector的子字符串,将名称拉出到新列中。
我尝试了Stringr的不同版本的str_detect,但无济于事。

I have a two variable dataframe one of which is a character vector. Each row in "MyVector" contains a string with exactly one name (i.e. "Pete"). The name can vary in its location in the character string. I want to create code that will match the name in a list with the name in the character string and extract that name into a new variable in the dataframe. If the name was always in the same position in the vector "MyVector", I would create a new variable as a substring of MyVector pulling out the name into a new column. I tried various version of str_detect from Stringr to no avail.

挑战:如何检测名称或将其提取到新变量中并将其放入MyDF如果名称在多个位置?

Challenge: How do I detect or extract the name into a new variable and place it into MyDF if the name is in multiple positions?

#Create the data frame
var.1 <-rep(c(1,5,3),2)

MyVector <- c("I know Pete", "Jerry has a new job","Victor is an employee","How to work with Pete","Too Many Students","Bob is mean")
   MyDF <-as.data.frame(cbind(var.1,MyVector))

#Create a vector of a list of names I want to extract into a new column in the dataframe.
Extract <- c("Jerry","Pete", "Bob", "Victor")

#Match would be perfect if I could use it on character vectors
MyDF$newvar <-match(MyDF$MyVector,Extract)

我的最终data.frame应该看起来像下面的输出。

My final data.frame should look something like the output below.

 var.1                     MyVector NEWVAR
1     1               Don knows Pete   Pete
2     5          Jerry has a new job  Jerry
3     3 Victor and Bob are employees Victor
4     1        How to work with Pete   Pete
5     5            Too Many Students     NA
6     3                  Bob is mean    Bob


推荐答案

我们可以使用粘贴一起粘贴到提取中之后,则str_extract

We can use str_extract after pasteing the 'Extract' together

library(stringr)
MyDF$NEWVAR <- str_extract(MyDF$MyVector, paste(Extract, collapse="|"))
MyDF$NEWVAR
#[1] "Pete"   "Jerry"  "Victor" "Pete"   NA       "Bob"   

这篇关于检测字符串变量中的单词列表,并将匹配的单词提取到数据框中的新变量中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆