非捕获组的正则表达式帮助 [英] Regex help on non-capturing groups

查看:38
本文介绍了非捕获组的正则表达式帮助的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一定是重复的,但我似乎找不到它...

Must be a duplication but I can't seem to find it...

我正在使用一个组来匹配重复的子字符串.但是,我不希望该组被捕获.这似乎是一个矛盾.

I am using a group to match a repeating sub-string. However, I do not want the group to be captured. This seems to be a contradiction.

明确地说,假设我想找到所有大写子集字符串的 3 个精确副本之后的任何字符.对于

To be explicit suppose I want to find any character that follows 3 exact replicas of an all capital subset strings. For

s = 'hjgABABABfgfBBdqCCCugDDD'
              |         |

它应该返回

['f', 'u']

我可以很好地找到重复的字符串和后面的字符

I can find very well the repeating strings and following character by

import re
print(re.findall(r'([A-Z]+)\1{2}(.)', s))

给出

[('AB', 'f'), ('C', 'u')]

我可以轻松解析结果列表并仅获取第二项.但是有没有一种正则表达式方法可以让第二个项目开始?如果我尝试做

I can easily parse the resulting list and get just the 2nd items. But is there a regex way to get just the 2nd items to begin with? If I try to do

print(re.findall(r'(?:[A-Z]+)\1{2}(.)', s))

我明白

raise source.error("invalid group reference", len(escape)) sre_constants.error: invalid group reference at position 10

我将感谢简短的验证,该问题确实是非捕获要求与检测重复所需的捕获之间的冲突.然后一个聪明的想法如何巧妙地实现目标.

I'll appreciate a short verification that the issue is indeed a collision between the non-capturing requirement and the capturing that is needed to detect repetitions. Then a clever idea how to achieve the goal neatly.

推荐答案

这行不通的原因是你写 \1 的时候基本上说的是第一组的内容",如果该组未捕获,则当然是未定义的.

The reason this won't work is that when you write \1 you basically say "the content of the first group", which is of course undefined if the group is non-capturing.

这篇关于非捕获组的正则表达式帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆