关于问号“懒惰"的正则表达式表示“是".模式 [英] Regular expression in regards to question mark "lazy" mode
问题描述
我了解?
标记在这里的意思是懒惰".
I understand the ?
mark here means "lazy".
我的问题本质上是[0-9]{2}?
vs [0-9]{2}
My question essentially is [0-9]{2}?
vs [0-9]{2}
它们是相同的吗?
如果是这样,我们为什么要写前一个表达式?惰性模式不是更昂贵的性能明智的选择吗?
如果没有,您能分辨出差异吗?
Are they same?
If so, why are we writing the former expression? Aren't lazy mode more expensive performance wise?
If not, can you tell the difference?
推荐答案
[0-9]{2}
和[0-9]{2}?
之间没有没有.
There is not a difference between [0-9]{2}
and [0-9]{2}?
.
贪婪匹配和惰性匹配(添加?
)之间的区别与回溯有关.构建正则表达式引擎来匹配文本(从左到右).因此,当您要求一个表达式匹配一系列字符时,它会尽可能地匹配.
The difference between greedy matching and lazy matching (the addition of a ?
) has to do with backtracking. Regular expression engines are built to match text (from left to right). Therefore it is logical that when you ask an expression to match a range of character(s), it matches as many as possible.
假设我们有字符串acac123
.
如果我们使用[a-z]+c
的贪婪匹配(+
代表1个重复或{1,}
):
If we use a greedy match of [a-z]+c
(+
standing for 1+ repetitions or {1,}
):
-
[a-z]+
将与acac
匹配,并在1
处失败
- 然后我们将尝试匹配
c
,但在1
失败
- 现在我们开始回溯,并成功匹配
aca
和c
[a-z]+
would matchacac
and fail at1
- then we would try to match the
c
, but fail at1
- now we start backtracking, and successfully match
aca
andc
如果我们使这个懒惰([a-z]+?c
),我们将得到不同的响应(在 this 情况下),并且效率更高:
If we make this lazy ([a-z]+?c
), we will get both a different response (in this case) and be more efficient:
-
[a-z]+?
会匹配a
,但会停止,因为它看到下一个字符与表达式c
的其余部分匹配
-
c
将匹配,成功匹配a
和c
(无回溯)
[a-z]+?
would matcha
, but stop because it sees the next character matches the rest of the expressionc
- the
c
would then match, successfully matchinga
andc
(with no backtracking)
现在您可以看到X{#}
和X{#}?
之间没有 差异,因为{#}
不是范围,即使是贪婪的比赛也不会经历任何回溯.惰性匹配通常与*
(0个重复或{0,}
)或+
一起使用,但也可以与范围{m,n}
(其中n
是可选的)一起使用.
Now you can see that there will be no difference between X{#}
and X{#}?
, because {#}
is not a range and even a greedy match will not experience any backtracking. Lazily matches are often used with *
(0+ repetitions or {0,}
) or +
, but can also be used with ranges {m,n}
(where n
is optional).
当您希望匹配尽可能少的字符时,这是必不可少的;当您要填充一些空间(字符串foo bar filler text bar
上的foo.*?bar
)时,经常会在表达式中看到.*?
.但是,许多情况下,延迟匹配是不良/无效正则表达式的一个示例.许多人会做类似foo:"(.*?)"
的操作来匹配双引号中的所有内容,这时您可以通过编写类似foo:"([^"]+)"
的表达式来避免惰性匹配并匹配任何但 "
s.
This is essential when you want to match the least amount of characters possible and you will often see .*?
in an expression when you want to fill up some space (foo.*?bar
on a string foo bar filler text bar
). However, many times a lazy match is an example of bad/inefficient regex. Many people will do something like foo:"(.*?)"
to match everything within double quotes, when you can avoid a lazy match by writing your expression like foo:"([^"]+)"
and match anything but "
s.
最后的注释,?
通常表示可选"或匹配{0,1}
次.如果在范围({m,n}
,*
,+
或其他?
)上使用?
,则只会使匹配延迟.这意味着X?
不会使X
变得懒惰(因为我们已经说过{#}?
是没有意义的),但是它将是可选的.但是,您可以进行惰性的可选"匹配:[0-9]??
将延迟匹配0-1次.
Final note, ?
typically means "optional" or match {0,1}
times. ?
only will make a match lazy if you use it on a range ({m,n}
, *
, +
, or another ?
). This means X?
will not make X
lazy (since we already said {#}?
is pointless), but instead it will be optional. However, you can do a lazy "optional" match: [0-9]??
will lazily match 0-1 times.
这篇关于关于问号“懒惰"的正则表达式表示“是".模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!