使用正则表达式删除重复字符? [英] Remove duplicate chars using regex?
问题描述
假设我想使用正则表达式删除字符串中的所有重复字符(特定字符的).这很简单 -
导入重新re.sub("a*", "a", "aaaa") # 给出 'a'
如果我想用相应的字符替换所有重复的字符(即 a、z)怎么办?我该怎么做?
导入重新re.sub('[a-z]*', <what_to_put_here>, 'aabb') # 应该给 'ab're.sub('[a-z]*', <what_to_put_here>, 'abbccddeeffgg') # 应该给 'abcdefg'
注意:我知道使用哈希表或一些 O(n^2) 算法可以更好地解决这种删除重复方法,但我想使用正则表达式来探索这一点
>>>进口重新>>>re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')'fbq'
[az]
周围的 ()
指定一个捕获组,然后是 \1
(模式和替换中的 反向引用) 都指向第一个捕获组的内容.
因此,正则表达式读作找到一个字母,然后是一个或多个相同字母的出现",然后整个找到的部分被替换为一个找到的字母.
旁注...
你的 a
示例代码实际上有问题:
您确实希望使用 'a+'
作为正则表达式而不是 'a*'
,因为 *
运算符匹配0或更多"出现,因此将匹配两个非 a
字符之间的空字符串,而 +
运算符匹配1 或更多".
Let's say I want to remove all duplicate chars (of a particular char) in a string using regular expressions. This is simple -
import re
re.sub("a*", "a", "aaaa") # gives 'a'
What if I want to replace all duplicate chars (i.e. a,z) with that respective char? How do I do this?
import re
re.sub('[a-z]*', <what_to_put_here>, 'aabb') # should give 'ab'
re.sub('[a-z]*', <what_to_put_here>, 'abbccddeeffgg') # should give 'abcdefg'
NOTE: I know this remove duplicate approach can be better tackled with a hashtable or some O(n^2) algo, but I want to explore this using regexes
>>> import re
>>> re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
'fbq'
The ()
around the [a-z]
specify a capture group, and then the \1
(a backreference) in both the pattern and the replacement refer to the contents of the first capture group.
Thus, the regex reads "find a letter, followed by one or more occurrences of that same letter" and then entire found portion is replaced with a single occurrence of the found letter.
On side note...
Your example code for just a
is actually buggy:
>>> re.sub('a*', 'a', 'aaabbbccc')
'abababacacaca'
You really would want to use 'a+'
for your regex instead of 'a*'
, since the *
operator matches "0 or more" occurrences, and thus will match empty strings in between two non-a
characters, whereas the +
operator matches "1 or more".
这篇关于使用正则表达式删除重复字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!