如何使用 gsub() 准确替换字符串 [英] How do I replace the string exactly using gsub()
问题描述
我有一个语料库:txt =微电子图案中的图案层."我想用形式"完全替换术语模式",我尝试编写代码:
I have a corpus: txt = "a patterned layer within a microelectronic pattern." I would like to replace the term "pattern" exactly by "form", I try to write a code:
txt_replaced = gsub("pattern","form",txt)
然而,txt_replaced 中的响应语料是:在微电子形式中形成的层."
However, the responsed corpus in txt_replaced is: "a formed layer within a microelectronic form."
如您所见,术语patterned"被错误地替换为formed",因为patterned"中的部分特征与pattern"匹配.
As you can see, the term "patterned" is wrongly replaced by "formed" because parts of characteristics in "patterned" matched to "pattern".
我想查询是否可以使用 gsub() 准确替换字符串?即只替换完全匹配的词条.
I would like to query that if I can replace the string exactly using gsub()? That is, only the term with exactly match should be replaced.
我渴望得到如下回复:微电子形式中的图案层."
I thirst for a responsed as below: "a patterned layer within a microelectronic form."
非常感谢!
推荐答案
正如@koshke 所指出的,之前(由我)回答了一个非常相似的问题....但那是grep
,这是gsub
,所以我会再次回答:
As @koshke noted, a very similar question has been answered before (by me). ...But that was grep
and this is gsub
, so I'll answer it again:
"<"是单词开头的转义序列,">" 是结尾.在 R 字符串中,您需要将反斜杠加倍,因此:
"<" is an escape sequence for the beginning of a word, and ">" is the end. In R strings you need to double the backslashes, so:
txt <- "a patterned layer within a microelectronic pattern."
txt_replaced <- gsub("\<pattern\>","form",txt)
txt_replaced
# [1] "a patterned layer within a microelectronic form."
或者,您可以使用 而不是
<
和 >
. 匹配一个词边界,所以它可以在两端使用>
Or, you could use instead of
<
and >
. matches a word boundary so it can be used at both ends>
txt_replaced <- gsub("\bpattern\b","form",txt)
另请注意,如果您只想替换一次,则应使用 sub
而不是 gsub
.
Also note that if you want to replace only ONE occurrence, you should use sub
instead of gsub
.
这篇关于如何使用 gsub() 准确替换字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!