在R中,如何使用另一个字符串替换包含某个模式的字符串? [英] In R, how do I replace a string that contains a certain pattern with another string?
问题描述
gsub()
来替换拼写错误的拼写错误拼写。例如,说biolgy在一个称为Major的专业名单中拼写错误。如何让R检测拼写错误,并用正确的拼写替换?我尝试过 gsub('biol','Biology',Major)
,但只能替代biolgy中的前四个字母。如果我这样做,那么它只适用于这种情况,但是并没有发现其他形式的生物学错误拼写错误, 谢谢!
你应该定义一些漂亮正则表达式,或使用 base
包中的 agrep
。 stringr
package是另一种选择,我知道人们使用它,但我是一个非常大的正则表达式的粉丝,所以对我来说是一个不要的。 >
无论如何, agrep
应该诀窍:
agrep(biol,biology)
[1] 1
agrep(biolgy,biology)
[1] 1
编辑:
您还应该使用 ignore.case = TRUE
,但准备做一些簿记手...
I'm working on a project involving cleaning a list of data on college majors. I find that a lot are misspelled, so I was looking to use the function gsub()
to replace the misspelled ones with its correct spelling. For example, say 'biolgy' is misspelled in a list of majors called Major. How can I get R to detect the misspelling and replace it with its correct spelling? I've tried gsub('biol', 'Biology', Major)
but that only replaces the first four letters in 'biolgy'. If I do gsub('biolgy', 'Biology', Major)
, it works for that case alone, but that doesn't detect other forms of misspellings of 'biology'.
Thank you!
You should either define some nifty regular expression, or use agrep
from base
package. stringr
package is another option, I know that people use it, but I'm a very huge fan of regular expressions, so it's a no-no for me.
Anyway, agrep
should do the trick:
agrep("biol", "biology")
[1] 1
agrep("biolgy", "biology")
[1] 1
EDIT:
You should also use ignore.case = TRUE
, but be prepared to do some bookkeeping "by hand"...
这篇关于在R中,如何使用另一个字符串替换包含某个模式的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!