Python正则表达式单词边界与意外结果 [英] Python regex words boundary with unexpected results

查看:37
本文介绍了Python正则表达式单词边界与意外结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

导入重新sstring = "ON Any ON Any"regex1 = re.compile(r''' \bON\bANY\b''', re.VERBOSE)regex2 = re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)regex3 = re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)对于 regex1.findall(sstring) 中的 a:print(a)打印(" -  -  -  -  - ")对于 regex2.findall(sstring) 中的 a:print(a)打印(" -  -  -  -  - ")对于 regex3.findall(sstring) 中的 a:print(a)打印(" -  -  -  -  - ")

<块引用><小时>

('开', '')('', '')('', '任何')('', '')('上', '')('', '')('', '任何')

('', '')

开启

任何

开启

任何

<小时>

在互联网和 S.O. 上阅读了许多文章.我想我还是不明白正则表达式的词边界:\b

第一个正则表达式没有给我预期的结果,我认为它必须给我ON Any On Any",但它仍然没有给我.

第二个正则表达式给了我元组,我不知道为什么或理解:('', '')

第三个正则表达式在分隔行和中间的空行上打印结果

你能帮我理解一下吗.

解决方案

请注意,要匹配 ON ANY,您需要添加一个转义符(因为您使用的是 re.VERBOSE> flag) ONANY 之间的空格作为 \b 字边界 是一个 零宽度断言 不消耗任何文本,只是在特定字符之间断言一个位置.这就是你第一次 re.compile(r''' \bON\bANY\b''', re.VERBOSE) 方法失败的原因.

使用

rx = re.compile(r''' \bON\ ANY\b ''', re.VERBOSE|re.IGNORECASE)

查看 Python 演示

re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE) 返回元组,因为您定义了 (...) 在模式中捕获组.

re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE) 匹配可选序列,或者 ONAny,因此您将这些词作为值.你也会得到空值,因为这个正则表达式只能匹配一个单词边界(所有其他子模式都是可选的).

有关单词边界的更多详细信息:

import re
sstring = "ON Any ON Any"
regex1 = re.compile(r''' \bON\bANY\b''', re.VERBOSE)
regex2 = re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)
regex3 = re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)
for a in regex1.findall(sstring): print(a)
print("----------")
for a in regex2.findall(sstring): print(a)
print("----------")
for a in regex3.findall(sstring): print(a)
print("----------")


('ON', '') ('', '') ('', 'Any') ('', '') ('ON', '') ('', '') ('', 'Any')

('', '')

ON

Any

ON

Any


Having read many articles on the internet and S.O. I think I still don't understand the regex word boundary: \b

The first regex doesn't give me the expected result I think it's must give me "ON Any On Any" but it still not give me that.

The second regex gives me tuples and I don't know why or understand the meaning of: ('', '')

The third regex gives prints the results on separated lines and empty lines in betweens

Could you please help me to understand that.

解决方案

Note that to match ON ANY you need to add an escaped (since you are using re.VERBOSE flag) space between ON and ANY as \b word boundary being a zero-width assertion does not consume any text, just asserts a position between specific characters. That is the reason for your first re.compile(r''' \bON\bANY\b''', re.VERBOSE) approach failure.

Use

rx = re.compile(r''' \bON\ ANY\b ''', re.VERBOSE|re.IGNORECASE)

See the Python demo

The re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE) returns tuples since you defined (...) capturing groups in the pattern.

The re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE) matches optional sequences, either ON or Any, so you get those words as values. You get empty values as well because this regex can match just a word boundary (all other subpatterns are optional).

More details about word boundaries:

这篇关于Python正则表达式单词边界与意外结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆