前瞻性和非捕获正则表达式 [英] lookahead and non-capturing regular expressions

查看:78
本文介绍了前瞻性和非捕获正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将电子邮件地址的本地部分与@字符匹配:

I'm trying to match the local part of an email address before the @ character with:

LOCAL_RE_NOTQUOTED = """
((
\w         # alphanumeric and _
| [!#$%&'*+-/=?^_`{|}~]          # special chars, but no dot at beginning
)
(
\w         # alphanumeric and _
| [!#$%&'*+-/=?^_`{|}~]          # special characters
| ([.](?![.])) # negative lookahead to avoid pairs of dots. 
)*)
(?<!\.)(?:@)           # no end with dot before @
"""

进行以下测试:

re.match(LOCAL_RE_NOTQUOTED, "a.a..a@", re.VERBOSE).group()

给予:

'a.a..a@'

即使我使用的是非捕获组(?:@),为什么在输出中也会显示@?

Why is the @ printed in the output, even though I'm using a non-capturing group (?:@)?

进行以下测试:

 re.match(LOCAL_RE_NOTQUOTED, "a.a..a@", re.VERBOSE).groups()

给予:

('a.a..a', 'a', 'a', None)

为什么正则表达式不拒绝带有一对点'..'的字符串?

Why does the regex not reject the string with a pair of dots '..'?

推荐答案

您正在使非捕获组(?:...)和超前断言(?=...)混淆.

You're confusing non-capturing groups (?:...) and lookahead assertions (?=...).

前者确实参加了比赛(因此是match.group()的一部分,后者包含整体比赛),他们只是不生成反向引用($1等,供以后使用).

The former do participate in the match (and are thus part of match.group() which contains the overall match), they just don't generate a backreference ($1 etc. for later use).

第二个问题(为什么双点匹配?)比较棘手.这是因为您的正则表达式错误.您会看到,当您撰写(简明扼要)

The second problem (Why is the double dot matched?) is a bit trickier. This is because of an error in your regex. You see, when you wrote (shortened to make the point)

[+-/]

您写了在+/之间匹配一个字符,并且在ASCII中,点在它们之间(ASCII 43-47:+,-./).因此,第一个字符类与该点匹配,并且先行断言永远不会到达.您需要将破折号放在字符类的末尾以将其视为文字破折号:

you wrote "Match a character between + and /, and in ASCII, the dot is right between them (ASCII 43-47: +,-./). Therefore, the first character class matches the dot, and the lookahead assertion is never reached. You need to place the dash at the end of the character class to treat it as a literal dash:

((
\w         # alphanumeric and _
| [!#$%&'*+/=?^_`{|}~-]          # special chars, but no dot at beginning
)
(
\w         # alphanumeric and _
| [!#$%&'*+/=?^_`{|}~-]          # special characters
| ([.](?![.])) # negative lookahead to avoid pairs of dots. 
)*)
(?<!\.)(?=@)           # no end with dot before @

当然,如果您想使用此逻辑,则可以对其进行一些简化:

And of course, if you want to use this logic, you can streamline it a bit:

^(?!\.)                   # no dot at the beginning
(?:
[\w!#$%&'*+/=?^_`{|}~-]   # alnums or special characters except dot
| (\.(?![.@]))            # or dot unless it's before a dot or @ 
)*
(?=@)                     # end before @

这篇关于前瞻性和非捕获正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆