R:查找并删除所有一到两个字母的单词 [英] R: Find and remove all one to two letter words

查看:65
本文介绍了R:查找并删除所有一到两个字母的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从文本段落中清除任何一个或两个字母的单词.这是我的第一个想法

I am attempting to clean away any one or two letter words from a text passage. This was my first thought

gsub(" [a-zA-Z]{1,2} ", " ", "a ab abc B BB BBB")
[1] "a aaa BB BBBB"

我可以看到a"是如何不被替换的,因为它没有以空格开头,我可以看到BB"是如何不被替换的,因为它所引导的空格已经被B"占据了.

I can see how the "a" is not replaced as it does not lead with a space and I can see how the "BB" is not replaced as the space it leads with has already been grabbed by the " B ".

推荐答案

你可以利用\b 词边界和[[:alpha:]] 括号表达式使用 {1,2} 限制量词,然后修剪前导/尾随空格并将多个空格缩小为 1:

You can make use of \b word boundary and [[:alpha:]] bracket expression with {1,2} limiting quantifier, and then trim the leading/trailing spaces and shrink multiple spaces into 1:

tr <- "a ab abc B BB BBB f"
tr <- gsub(" *\\b[[:alpha:]]{1,2}\\b *", " ", tr) # Remove 1-2 letter words
gsub("^ +| +$|( ) +", "\\1", tr) # Remove excessive spacing

结果:

[1] "abc BBB"

参见 IDEONE 演示

这篇关于R:查找并删除所有一到两个字母的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆