正则表达式用两个以上连续字符替换单词 [英] regex to replace words with more than two consecutive characters

查看:110
本文介绍了正则表达式用两个以上连续字符替换单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何检测一个单词中存在两个以上连续字符并删除该单词?

How can I detect the presence of more than two consecutive characters in a word and remove that word?

我似乎能够做到这一点:

I seem to be able to do it like this:

# example data
mystring <- c(1, 2, 3, "toot", "tooooot")
# clunky regex
gsub("^[[:alpha:]]$", "", gsub(".*(.)\\1+\\1", "", mystring)) 
[1] "1"    "2"    "3"    "toot" "" 

但是我敢肯定有一种更有效的方法。我怎么只用一个 gsub 就能做到?

But I'm sure there is a more efficient way. How can I do it with just one gsub?

推荐答案

使用 grepl 代替。

mystring <- c(1, 2, 3, "toot", "tooooot", "good", "apple", "banana")
mystring[!grepl("(.)\\1{2,}", mystring)]
## [1] "1"      "2"      "3"      "toot"   "good"   "apple"  "banana"

**说明**

\\1 匹配第一组(在这种情况下(。))。 {2,} 指定前一个字符至少应匹配2次或更多次。由于我们要匹配重复3次或更多次的任何字符-首次出现(。),因此 \\1 需要匹配2倍以上的矿石。

** Explanation**
\\1 matches first group (in this case (.) ). {2,} specifies that preceding character should be matched atleast 2 times or more. Since we want to match any character repeated 3 times or more - (.) is first occurrence, \\1 needs to be matched 2 times ore more.

这篇关于正则表达式用两个以上连续字符替换单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆