关于问号“懒惰"的正则表达式表示“是".模式 [英] Regular expression in regards to question mark "lazy" mode

查看:77
本文介绍了关于问号“懒惰"的正则表达式表示“是".模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我了解?标记在这里的意思是懒惰".

I understand the ? mark here means "lazy".

我的问题本质上是[0-9]{2}? vs [0-9]{2}

My question essentially is [0-9]{2}? vs [0-9]{2}

它们是相同的吗?
如果是这样,我们为什么要写前一个表达式?惰性模式不是更昂贵的性能明智的选择吗?
如果没有,您能分辨出差异吗?

Are they same?
If so, why are we writing the former expression? Aren't lazy mode more expensive performance wise?
If not, can you tell the difference?

推荐答案

[0-9]{2}[0-9]{2}?之间没有没有.

There is not a difference between [0-9]{2} and [0-9]{2}?.

贪婪匹配和惰性匹配(添加?)之间的区别与回溯有关.构建正则表达式引擎来匹配文本(从左到右).因此,当您要求一个表达式匹配一系列字符时,它会尽可能地匹配.

The difference between greedy matching and lazy matching (the addition of a ?) has to do with backtracking. Regular expression engines are built to match text (from left to right). Therefore it is logical that when you ask an expression to match a range of character(s), it matches as many as possible.

假设我们有字符串acac123.

如果我们使用[a-z]+c的贪婪匹配(+代表1个重复或{1,}):

If we use a greedy match of [a-z]+c (+ standing for 1+ repetitions or {1,}):

  • [a-z]+将与acac匹配,并在1
  • 处失败
  • 然后我们将尝试匹配c,但在1
  • 失败
  • 现在我们开始回溯,并成功匹配acac
  • [a-z]+ would match acac and fail at 1
  • then we would try to match the c, but fail at 1
  • now we start backtracking, and successfully match aca and c

如果我们使这个懒惰([a-z]+?c),我们将得到不同的响应(在 this 情况下),并且效率更高:

If we make this lazy ([a-z]+?c), we will get both a different response (in this case) and be more efficient:

  • [a-z]+?会匹配a,但会停止,因为它看到下一个字符与表达式c
  • 的其余部分匹配
  • c将匹配,成功匹配ac(无回溯)
  • [a-z]+? would match a, but stop because it sees the next character matches the rest of the expression c
  • the c would then match, successfully matching a and c (with no backtracking)

现在您可以看到X{#}X{#}?之间没有 差异,因为{#}不是范围,即使是贪婪的比赛也不会经历任何回溯.惰性匹配通常与*(0个重复或{0,})或+一起使用,但也可以与范围{m,n}(其中n是可选的)一起使用.

Now you can see that there will be no difference between X{#} and X{#}?, because {#} is not a range and even a greedy match will not experience any backtracking. Lazily matches are often used with * (0+ repetitions or {0,}) or +, but can also be used with ranges {m,n} (where n is optional).

当您希望匹配尽可能少的字符时,这是必不可少的;当您要填充一些空间(字符串foo bar filler text bar上的foo.*?bar)时,经常会在表达式中看到.*?.但是,许多情况下,延迟匹配是不良/无效正则表达式的一个示例.许多人会做类似foo:"(.*?)"的操作来匹配双引号中的所有内容,这时您可以通过编写类似foo:"([^"]+)"的表达式来避免惰性匹配并匹配任何 " s.

This is essential when you want to match the least amount of characters possible and you will often see .*? in an expression when you want to fill up some space (foo.*?bar on a string foo bar filler text bar). However, many times a lazy match is an example of bad/inefficient regex. Many people will do something like foo:"(.*?)" to match everything within double quotes, when you can avoid a lazy match by writing your expression like foo:"([^"]+)" and match anything but "s.

最后的注释,?通常表示可选"或匹配{0,1}次.如果在范围({m,n}*+或其他?)上使用?,则只会使匹配延迟.这意味着X?不会使X变得懒惰(因为我们已经说过{#}?是没有意义的),但是它将是可选的.但是,您可以进行惰性的可选"匹配:[0-9]??将延迟匹配0-1次.

Final note, ? typically means "optional" or match {0,1} times. ? only will make a match lazy if you use it on a range ({m,n}, *, +, or another ?). This means X? will not make X lazy (since we already said {#}? is pointless), but instead it will be optional. However, you can do a lazy "optional" match: [0-9]?? will lazily match 0-1 times.

这篇关于关于问号“懒惰"的正则表达式表示“是".模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆