python正则表达式向前看正负 [英] python regex look ahead positive + negative
问题描述
此正则表达式将得到456.我的问题是为什么它不能从1-234-56变为234? 56是否限定(?!\ d))模式,因为它不是一位数字. (?!\ d))寻找的起点在哪里?
This regex will get 456. My question is why it CANNOT be 234 from 1-234-56 ? Does 56 qualify the (?!\d)) pattern since it is NOT a single digit. Where is the beginning point that (?!\d)) will look for?
import re
pattern = re.compile(r'\d{1,3}(?=(\d{3})+(?!\d))')
a = pattern.findall("The number is: 123456") ; print(a)
在第一阶段添加逗号分隔符,例如123,456.
It is in the first stage to add the comma separator like 123,456.
a = pattern.findall("The number is: 123456") ; print(a)
results = pattern.finditer('123456')
for result in results:
print ( result.start(), result.end(), result)
推荐答案
我的问题是为什么它不能从
1-234-56
变为234
?
这是不可能的,因为(?=(\d{3})+(?!\d))
要求1位数至3位数的序列后出现3位数的序列. 56
(您想象中的场景中的最后一个数字组)是一个2位数的组.由于量词可以是懒惰的,也可以是贪婪的,因此您不能同时将一个,两个和三个数字组与\d{1,3}
进行匹配.要从123456
获得234
,您需要为其专门定制的正则表达式: \B\d{3}
或 (?<=1)\d{3}
甚至是
It is not possible as (?=(\d{3})+(?!\d))
requires 3-digit sequences appear after a 1-3-digit sequence. 56
(the last digit group in your imagined scenario) is a 2-digit group. Since a quantifier can be either lazy or greedy, you cannot match both one, two and three digit groups with \d{1,3}
. To get 234
from 123456
, you'd need a specifically tailored regex for it: \B\d{3}
, or (?<=1)\d{3}
or even \d{3}(?=\d{2}(?!\d))
.
56
是否匹配(?!\d))
模式? (?!\ d))寻找的起点在哪里?
Does
56
match the(?!\d))
pattern? Where is the beginning point that (?!\d)) will look for?
否,这是一个否定的超前查询,它不匹配,它仅检查输入字符串中当前位置之后是否没有数字.如果有数字,则匹配失败(找不到并返回结果).
No, this is a negative lookahead, it does not match, it only checks if there is no digit right after the current position in the input string. If there is a digit, the match is failed (not result found and returned).
关于前瞻的更多说明:它位于(\d{3})+
子模式之后,因此正则表达式引擎会在最后一个3位数字组之后立即开始搜索一个数字,如果找到该数字,则匹配失败(因为它是负面的前瞻).用简单的话来说, (?!\d)
是此正则表达式中的数字闭合/跟踪边界.
More clarification on the look-ahead: it is located after (\d{3})+
subpattern, thus the regex engine starts searching for a digit right after the last 3-digit group, and fails a match if the digit is found (as it is a negative lookahead). In plain words, the (?!\d)
is a number closing/trailing boundary in this regex.
更详细的细分:
-
\d{1,3}
-1至3位数字,尽可能多(使用贪婪量词) -
(?=(\d{3})+(?!\d))
-正前瞻((?=...)
),用于检查前面匹配的1-3位数字序列是否跟随-
(\d{3})+
-1个或更多(+
)正好3位数字的序列... -
(?!\d)
-后面没有数字.
\d{1,3}
- 1 to 3 digit sequence, as many as possible (greedy quantifier is used)(?=(\d{3})+(?!\d))
- a positive look-ahead ((?=...)
) that checks if the 1-3 digit sequence matched before are followed by(\d{3})+
- 1 or more (+
) sequences of exactly 3 digits...(?!\d)
- not followed by a digit.
黑头字符不匹配,不消耗字符,但是您仍然可以在其中捕获.当执行前瞻时,正则表达式索引与以前的字符相同. 使用您的正则表达式和输入,将
123
与\d{1,3}
匹配,因为您拥有3位数的序列(456
).但是456
在先行之内具有防护能力,并且re.findall
仅在设置了捕获组的情况下返回捕获的文本.Lookaheads do not match, do not consume characters, but you still can capture inside them. When a lookahead is executed, the regex index is at the same character as before. With your regex and input, you match
123
with\d{1,3}
as then you have 3-digit sequence (456
). But456
is capured within a lookahead, andre.findall
returns only captured texts if capturing groups are set.要仅将逗号添加为数字分组符号,请使用
To just add comma as digit grouping symbol, use
rx = r'\d(?=(?:\d{3})+(?!\d))'
请参见 IDEONE演示
这篇关于python正则表达式向前看正负的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-