如何使用 re 查找连续的、重复的字符 [英] How to use re to find consecutive, repeated chars

查看:107
本文介绍了如何使用 re 查找连续的、重复的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在一个字符串中找到所有连续、重复的字符块.例如,请考虑以下内容:

I want to find all consecutive, repeated character blocks in a string. For example, consider the following:

s = r'http://www.google.com/search=ooo-jjj'

我想找到这个:wwwooojjj.

What I want to find this: www, ooo and jjj.

我试着这样做:

m = re.search(r'(\w)\1\1', s)

但它似乎并没有像我预期的那样工作.有什么想法吗?

But it doesn't seem to work as I expect. Any ideas?

另外,我如何在 Bash 中做到这一点?

Also, how can I do it in Bash?

推荐答案

((\w)\2{2,}) 匹配 3 个或更多连续字符:

((\w)\2{2,}) matches 3 or more consecutive characters:

In [71]: import re
In [72]: s = r'http://www.google.com/search=ooo-jjjj'
In [73]: re.findall(r'((\w)\2{2,})', s)
Out[73]: [('www', 'w'), ('ooo', 'o'), ('jjjj', 'j')]

In [78]: [match[0] for match in re.findall(r'((\w)\2{2,})', s)]
Out[78]: ['www', 'ooo', 'jjjj']

(\w) 匹配任何字母数字字符.

(\w) matches any alphanumeric character.

((\w)\2) 匹配任何后跟相同字符的字母数字字符,因为 \2 匹配组号 2 的内容.由于我嵌套了括号,组号 2 指的是 \w 匹配的字符.

((\w)\2) matches any alphanumeric character followed by the same character, since \2 matches the contents of group number 2. Since I nested the parentheses, group number 2 refers to the character matched by \w.

然后把它们放在一起,((\w)\2{2,}) 匹配任何字母数字字符,后跟重复2 次或更多次的相同字符.

Then putting it all together, ((\w)\2{2,}) matches any alphanumeric character, followed by the same character repeated 2 or more additional times.

总的来说,这意味着正则表达式要求字符重复 3 次或更多次.

In total, that means the regex require the character to be repeated 3 or more times.

这篇关于如何使用 re 查找连续的、重复的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆