使用正则表达式删除重复字符? [英] Remove duplicate chars using regex?

查看：82 发布时间：2021/6/25 20:29:50 python regex string

本文介绍了使用正则表达式删除重复字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我想使用正则表达式删除字符串中的所有重复字符(特定字符的).这很简单 -

导入重新re.sub("a*", "a", "aaaa") # 给出 'a'

如果我想用相应的字符替换所有重复的字符(即 a、z)怎么办?我该怎么做?

导入重新re.sub('[a-z]*', <what_to_put_here>, 'aabb') # 应该给 'ab're.sub('[a-z]*', <what_to_put_here>, 'abbccddeeffgg') # 应该给 'abcdefg'

注意:我知道使用哈希表或一些 O(n^2) 算法可以更好地解决这种删除重复方法，但我想使用正则表达式来探索这一点

解决方案

>>>进口重新>>>re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')'fbq'

[az] 周围的 () 指定一个捕获组，然后是 \1 (模式和替换中的 反向引用) 都指向第一个捕获组的内容.

因此，正则表达式读作找到一个字母，然后是一个或多个相同字母的出现"，然后整个找到的部分被替换为一个找到的字母.

旁注...

你的 a 示例代码实际上有问题:

<预><代码>>>>re.sub('a*', 'a', 'aaabbbccc')'abababacacaca'

您确实希望使用 'a+' 作为正则表达式而不是 'a*'，因为 * 运算符匹配0或更多"出现，因此将匹配两个非 a 字符之间的空字符串，而 + 运算符匹配1 或更多".

Let's say I want to remove all duplicate chars (of a particular char) in a string using regular expressions. This is simple -

import re
re.sub("a*", "a", "aaaa") # gives 'a'

What if I want to replace all duplicate chars (i.e. a,z) with that respective char? How do I do this?

import re
re.sub('[a-z]*', <what_to_put_here>, 'aabb') # should give 'ab'
re.sub('[a-z]*', <what_to_put_here>, 'abbccddeeffgg') # should give 'abcdefg'

NOTE: I know this remove duplicate approach can be better tackled with a hashtable or some O(n^2) algo, but I want to explore this using regexes

解决方案

>>> import re
>>> re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
'fbq'

The () around the [a-z] specify a capture group, and then the \1 (a backreference) in both the pattern and the replacement refer to the contents of the first capture group.

Thus, the regex reads "find a letter, followed by one or more occurrences of that same letter" and then entire found portion is replaced with a single occurrence of the found letter.

On side note...

Your example code for just a is actually buggy:

>>> re.sub('a*', 'a', 'aaabbbccc')
'abababacacaca'

You really would want to use 'a+' for your regex instead of 'a*', since the * operator matches "0 or more" occurrences, and thus will match empty strings in between two non-a characters, whereas the + operator matches "1 or more".

这篇关于使用正则表达式删除重复字符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用正则表达式删除重复字符? [英] Remove duplicate chars using regex?

问题描述

旁注...

On side note...

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用正则表达式删除重复字符? [英] Remove duplicate chars using regex?

问题描述

旁注...

On side note...

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭