Python正则表达式单词边界与意外结果 [英] Python regex words boundary with unexpected results
问题描述
导入重新sstring = "ON Any ON Any"regex1 = re.compile(r''' \bON\bANY\b''', re.VERBOSE)regex2 = re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)regex3 = re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)对于 regex1.findall(sstring) 中的 a:print(a)打印(" - - - - - ")对于 regex2.findall(sstring) 中的 a:print(a)打印(" - - - - - ")对于 regex3.findall(sstring) 中的 a:print(a)打印(" - - - - - ")
<块引用><小时>
('开', '')('', '')('', '任何')('', '')('上', '')('', '')('', '任何')
('', '')
开启
任何
开启
任何
<小时>在互联网和 S.O. 上阅读了许多文章.我想我还是不明白正则表达式的词边界:\b
第一个正则表达式没有给我预期的结果,我认为它必须给我ON Any On Any",但它仍然没有给我.
第二个正则表达式给了我元组,我不知道为什么或理解:('', '')
第三个正则表达式在分隔行和中间的空行上打印结果
你能帮我理解一下吗.
请注意,要匹配 ON ANY
,您需要添加一个转义符(因为您使用的是 re.VERBOSE
> flag) ON
和 ANY
之间的空格作为 \b
字边界 是一个 零宽度断言 不消耗任何文本,只是在特定字符之间断言一个位置.这就是你第一次 re.compile(r''' \bON\bANY\b''', re.VERBOSE)
方法失败的原因.
使用
rx = re.compile(r''' \bON\ ANY\b ''', re.VERBOSE|re.IGNORECASE)
查看 Python 演示
re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)
返回元组,因为您定义了 (...)
在模式中捕获组.
re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)
匹配可选序列,或者 ON
或 Any
,因此您将这些词作为值.你也会得到空值,因为这个正则表达式只能匹配一个单词边界(所有其他子模式都是可选的).
有关单词边界的更多详细信息:
- Regular-Expressions.info 上的词边界
- Java Regex Word Boundaries(这仍然是正则表达式中的词边界,也适用在这里)
import re
sstring = "ON Any ON Any"
regex1 = re.compile(r''' \bON\bANY\b''', re.VERBOSE)
regex2 = re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)
regex3 = re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)
for a in regex1.findall(sstring): print(a)
print("----------")
for a in regex2.findall(sstring): print(a)
print("----------")
for a in regex3.findall(sstring): print(a)
print("----------")
('ON', '') ('', '') ('', 'Any') ('', '') ('ON', '') ('', '') ('', 'Any')
('', '')
ON
Any
ON
Any
Having read many articles on the internet and S.O. I think I still don't understand the regex word boundary: \b
The first regex doesn't give me the expected result I think it's must give me "ON Any On Any" but it still not give me that.
The second regex gives me tuples and I don't know why or understand the meaning of: ('', '')
The third regex gives prints the results on separated lines and empty lines in betweens
Could you please help me to understand that.
Note that to match ON ANY
you need to add an escaped (since you are using re.VERBOSE
flag) space between ON
and ANY
as \b
word boundary being a zero-width assertion does not consume any text, just asserts a position between specific characters. That is the reason for your first re.compile(r''' \bON\bANY\b''', re.VERBOSE)
approach failure.
Use
rx = re.compile(r''' \bON\ ANY\b ''', re.VERBOSE|re.IGNORECASE)
See the Python demo
The re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE)
returns tuples since you defined (...)
capturing groups in the pattern.
The re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE)
matches optional sequences, either ON
or Any
, so you get those words as values. You get empty values as well because this regex can match just a word boundary (all other subpatterns are optional).
More details about word boundaries:
- Word boundaries at Regular-Expressions.info
- Java Regex Word Boundaries (this is still a word boundary in a regex, also applicable here)
这篇关于Python正则表达式单词边界与意外结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!