R:查找并删除所有一到两个字母的单词 [英] R: Find and remove all one to two letter words
本文介绍了R:查找并删除所有一到两个字母的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从文本段落中清除任何一个或两个字母的单词.这是我的第一个想法
I am attempting to clean away any one or two letter words from a text passage. This was my first thought
gsub(" [a-zA-Z]{1,2} ", " ", "a ab abc B BB BBB")
[1] "a aaa BB BBBB"
我可以看到a"是如何不被替换的,因为它没有以空格开头,我可以看到BB"是如何不被替换的,因为它所引导的空格已经被B"占据了.
I can see how the "a" is not replaced as it does not lead with a space and I can see how the "BB" is not replaced as the space it leads with has already been grabbed by the " B ".
推荐答案
你可以利用\b
词边界和[[:alpha:]]
括号表达式使用 {1,2}
限制量词,然后修剪前导/尾随空格并将多个空格缩小为 1:
You can make use of \b
word boundary and [[:alpha:]]
bracket expression with {1,2}
limiting quantifier, and then trim the leading/trailing spaces and shrink multiple spaces into 1:
tr <- "a ab abc B BB BBB f"
tr <- gsub(" *\\b[[:alpha:]]{1,2}\\b *", " ", tr) # Remove 1-2 letter words
gsub("^ +| +$|( ) +", "\\1", tr) # Remove excessive spacing
结果:
[1] "abc BBB"
参见 IDEONE 演示
这篇关于R:查找并删除所有一到两个字母的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文