与r中的dataframe列完全匹配的文本 [英] Exact Matching text with dataframe column in r

查看:243
本文介绍了与r中的dataframe列完全匹配的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个单词向量:

I have a vector of words in R:

words = c("Awesome","Loss","Good","Bad")

我在R中有以下数据框:

And I have the following dataframe in R:

df <- data.frame(ID = c(1,2,3),
                 Response = c("Today is an awesome day", 
                              "Yesterday was a bad day,but today it is good",
                              "I have losses today"))

我想做的是应该提取出与响应"列中完全匹配的单词,并将其插入到数据框中的新列中.最终输出应该像这样

What I want to do is words that are exact matching in Response column should be extracted and inserted into new column in dataframe. Final output should look like this

ID           Response                        Match          
1            Today is an awesome day        Awesome           
2            Yesterday was a bad day        Bad,Good           
             ,but today it is good      
3            I have losses today            NA

我使用了以下代码:

x <- sapply(words, function(x) grepl(tolower(x), tolower(df$Response)))

将匹配的单词粘贴在一起

df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))

但是它提供了匹配项,但不提供确切的匹配项.请帮忙.

But it is providing the match, but not the exact. Please help.

推荐答案

将第一个*apply函数更改为两行函数.如果正则表达式变为"\\bword\\b",则它将捕获边界包围的单词.

Change the first *apply function to a two lines function. If the regex becomes "\\bword\\b" then it captures the word surrounded by boundaries.

x <- sapply(words, function(x) {
  y <- paste0("\\b", x, "\\b")
  grepl(tolower(y), tolower(df$Response))
})

现在运行问题中发布的第二个apply.

Now run the second apply as posted in the question.

df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))

df
#  ID                                     Response    Words
#1  1                      Today is an awesome day  Awesome
#2  2 Yesterday was a bad day,but today it is good Good,Bad
#3  3                          I have losses today       

对于NA,我将使用功能is.na<-.

is.na(df$Words) <- df$Words == ""

数据.

df <- read.table(text = "
ID           Response
1            'Today is an awesome day'
2            'Yesterday was a bad day,but today it is good'
3            'I have losses today'
", header = TRUE)

words <- c("Awesome","Loss","Good","Bad")

这篇关于与r中的dataframe列完全匹配的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆