R:提取和粘贴关键字匹配 [英] R: extract and paste keyword matches

查看:36
本文介绍了R:提取和粘贴关键字匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 的新手,一直在努力解决这个问题.我想创建一个新列,检查text"列中是否存在一组单词(foo"、x"、y"),然后将该值写入新列.

I am new to R and have been struggling with this one. I want to create a new column, that checks if a set of any of words ("foo", "x", "y") exist in column 'text', then write that value in new column.

我有一个如下所示的数据框:a->

I have a data frame that looks like this: a->

 id     text        time   username
 1     "hello x"     10     "me"
 2     "foo and y"   5      "you"
 3     "nothing"     15     "everyone"
 4     "x,y,foo"     0      "know"

正确的输出应该是:

a2 ->

id     text        time   username        keywordtag  
 1     "hello x"     10     "me"          x
 2     "foo and y"   5      "you"         foo,y
 3     "nothing"     15     "everyone"    0 
 4     "x,y,foo"     0      "know"        x,y,foo

我有这个:

df1 <- data.frame(text = c("hello x", "foo and y", "nothing", "x,y,foo"))
terms <- c('foo', 'x', 'y')
df1$keywordtag <- apply(sapply(terms, grepl, df1$text), 1, function(x) paste(terms[x], collapse=','))

哪个有效,但是当我的needleList包含12k个单词并且我的文本有155k行时会导致R崩溃.有没有办法做到这一点,不会使 R 崩溃?

Which works, but crashes R when my needleList contains 12k words and my text has 155k rows. Is there a way to do this that won't crash R?

推荐答案

这是您所做的以及评论中建议的内容的变体.这使用 dplyrstringr.可能有更有效的方法,但这可能不会使您的 R 会话崩溃.

This is a variation on what you have done, and what was suggested in the comments. This uses dplyr and stringr. There may be a more efficient way but this may not crash your R session.

library(dplyr)
library(stringr)

terms      <- c('foo', 'x', 'y')
term_regex <- paste0('(', paste(terms, collapse = '|'), ')')

### Solution: this uses dplyr::mutate and stringr::str_extract_all
df1 %>%
    mutate(keywordtag = sapply(str_extract_all(text, term_regex), function(x) paste(x, collapse=',')))
#       text keywordtag
#1   hello x          x
#2 foo and y      foo,y
#3   nothing           
#4   x,y,foo    x,y,foo

这篇关于R:提取和粘贴关键字匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆