使gsub只替换整个单词? [英] Making gsub only replace entire words?

查看:55
本文介绍了使gsub只替换整个单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(我正在使用R.)对于一个称为"goodwords.corpus"的单词列表,我正在遍历一个语料库中的文档,并将列表"goodwords.corpus"上的每个单词替换为单词+一个数字.

(I'm using R.) For a list of words that's called "goodwords.corpus", I am looping through the documents in a corpus, and replacing each of the words on the list "goodwords.corpus" with the word + a number.

因此,例如,如果单词"good"在列表中,而晚安"不在列表中,则此文档:

So for example if the word "good" is on the list, and "goodnight" is NOT on the list, then this document:

I am having a good time goodnight

将变成:

I am having a good 1234 time goodnight

**我正在使用此代码(编辑使此代码可重现):

**I'm using this code (EDIT- made this reproducible):

goodwords.corpus <- c("good")
test <- "I am having a good time goodnight"
for (i in 1:length(goodwords.corpus)){
test <-gsub(goodwords.corpus[[i]], paste(goodwords.corpus[[i]], "1234"), test)
}

但是,问题是我希望gsub只替换整个单词.出现的问题是:"good"在"goodwords.corpus"列表中,但随后不在列表中的"goodnight"也受到了影响.所以我明白了:

However, the problem is I want gsub to only replace ENTIRE words. The issue that arises is that: "good" is on the "goodwords.corpus" list, but then "goodnight", which is NOT on the list, is also affected. So I get this:

I am having a good 1234 time good 1234night

无论如何,我可以告诉gsub仅替换整个单词,而不替换可能是其他单词一部分的单词吗?

Is there anyway I can tell gsub to only replace ENTIRE words, and not words that might be a part of other words?

我想使用这个:

test <-gsub("\\<goodwords.corpus[[i]]\\>", paste(goodwords.corpus[[i]], "1234"), test)
}

我已经读过\<和\>将告诉gsub只查找整个单词.但这显然是行不通的,因为goodwords.corpus [[i]]用引号引起来将不起作用.

I've read that the \< and \> will tell gsub to only look for whole words. But obviously that doesn't work, because goodwords.corpus[[i]] won't work when it's in quotes.

有什么建议吗?

推荐答案

如此已经很接近了.您已经在使用paste来形成替换字符串,为什么不使用它来形成模式字符串呢?

You are so close to getting this. You're already using paste to form the replacement string, why not use it to form the pattern string?

goodwords.corpus <- c("good")
test <- "I am having a good time goodnight"
for (i in 1:length(goodwords.corpus)){
    test <-gsub(paste0('\\<', goodwords.corpus[[i]], '\\>'), paste(goodwords.corpus[[i]], "1234"), test)
}
test
# [1] "I am having a good 1234 time goodnight"

(paste0仅仅是paste(..., sep='').)

(我与@MatthewLundberg同时发布了这个消息,他也是正确的.实际上我对使用\b\<更加熟悉,但是我认为我会继续使用您的代码.)

(I posted this the same time as @MatthewLundberg, and his is also correct. I'm actually more familiar with using \b vice \<, but I thought I'd continue with using your code.)

这篇关于使gsub只替换整个单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆