R:提取并粘贴关键字匹配项 [英] R: extract and paste keyword matches
问题描述
我是R的新手,并且一直在为此奋斗.我想创建一个新列,该列检查文本"列中是否存在一组任何单词("foo","x","y"),然后将该值写入新列中.
I am new to R and have been struggling with this one. I want to create a new column, that checks if a set of any of words ("foo", "x", "y") exist in column 'text', then write that value in new column.
我有一个看起来像这样的数据框:a->
I have a data frame that looks like this: a->
id text time username
1 "hello x" 10 "me"
2 "foo and y" 5 "you"
3 "nothing" 15 "everyone"
4 "x,y,foo" 0 "know"
正确的输出应为:
a2->
id text time username keywordtag
1 "hello x" 10 "me" x
2 "foo and y" 5 "you" foo,y
3 "nothing" 15 "everyone" 0
4 "x,y,foo" 0 "know" x,y,foo
我有这个:
df1 <- data.frame(text = c("hello x", "foo and y", "nothing", "x,y,foo"))
terms <- c('foo', 'x', 'y')
df1$keywordtag <- apply(sapply(terms, grepl, df1$text), 1, function(x) paste(terms[x], collapse=','))
这是可行的,但是当我的needleList包含12k个单词并且我的文本具有15.5万行时,R崩溃.有没有一种方法可以使R不会崩溃?
Which works, but crashes R when my needleList contains 12k words and my text has 155k rows. Is there a way to do this that won't crash R?
推荐答案
这是您所做的工作以及注释中所建议内容的变体.这使用dplyr
和stringr
.也许有一种更有效的方法,但这可能不会使您的R会话崩溃.
This is a variation on what you have done, and what was suggested in the comments. This uses dplyr
and stringr
. There may be a more efficient way but this may not crash your R session.
library(dplyr)
library(stringr)
terms <- c('foo', 'x', 'y')
term_regex <- paste0('(', paste(terms, collapse = '|'), ')')
### Solution: this uses dplyr::mutate and stringr::str_extract_all
df1 %>%
mutate(keywordtag = sapply(str_extract_all(text, term_regex), function(x) paste(x, collapse=',')))
# text keywordtag
#1 hello x x
#2 foo and y foo,y
#3 nothing
#4 x,y,foo x,y,foo
这篇关于R:提取并粘贴关键字匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!