如何使用 re 查找连续的、重复的字符 [英] How to use re to find consecutive, repeated chars
问题描述
我想在一个字符串中找到所有连续、重复的字符块.例如,请考虑以下内容:
I want to find all consecutive, repeated character blocks in a string. For example, consider the following:
s = r'http://www.google.com/search=ooo-jjj'
我想找到这个:www
、ooo
和 jjj
.
What I want to find this: www
, ooo
and jjj
.
我试着这样做:
m = re.search(r'(\w)\1\1', s)
但它似乎并没有像我预期的那样工作.有什么想法吗?
But it doesn't seem to work as I expect. Any ideas?
另外,我如何在 Bash 中做到这一点?
Also, how can I do it in Bash?
推荐答案
((\w)\2{2,})
匹配 3 个或更多连续字符:
((\w)\2{2,})
matches 3 or more consecutive characters:
In [71]: import re
In [72]: s = r'http://www.google.com/search=ooo-jjjj'
In [73]: re.findall(r'((\w)\2{2,})', s)
Out[73]: [('www', 'w'), ('ooo', 'o'), ('jjjj', 'j')]
In [78]: [match[0] for match in re.findall(r'((\w)\2{2,})', s)]
Out[78]: ['www', 'ooo', 'jjjj']
(\w)
匹配任何字母数字字符.
(\w)
matches any alphanumeric character.
((\w)\2)
匹配任何后跟相同字符的字母数字字符,因为 \2
匹配组号 2 的内容.由于我嵌套了括号,组号 2 指的是 \w
匹配的字符.
((\w)\2)
matches any alphanumeric character followed by the same character, since \2
matches the contents of group number 2.
Since I nested the parentheses, group number 2 refers to the character matched by \w
.
然后把它们放在一起,((\w)\2{2,})
匹配任何字母数字字符,后跟重复2 次或更多次的相同字符.
Then putting it all together,
((\w)\2{2,})
matches any alphanumeric character, followed by the same character repeated 2 or more additional times.
总的来说,这意味着正则表达式要求字符重复 3 次或更多次.
In total, that means the regex require the character to be repeated 3 or more times.
这篇关于如何使用 re 查找连续的、重复的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!