贪婪/懒惰(非贪婪)/占有量词在内部如何工作? [英] How do greedy / lazy (non-greedy) / possessive quantifiers work internally?

查看:72
本文介绍了贪婪/懒惰(非贪婪)/占有量词在内部如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到有 3 类不同的量词:贪婪、懒惰(即非贪婪)和占有欲.

I noticed that there are 3 different classes of quantifiers: greedy, lazy (i.e. non-greedy) and possessive.

我知道,笼统地说,贪婪量词会尝试通过首先读取整个输入字符串来获得最长匹配,然后在尝试不断失败时逐个截断字符;lazy 量词尝试通过首先读取空字符串来获得最短匹配,如果尝试一直失败,然后一个一个地添加字符;占有量词尝试与贪婪量词相同的方式,但如果第一次尝试失败,它们将停止匹配.

I know that, loosely speaking, greedy quantifiers try to get the longest match by first reading in the entire input string and then truncate the characters one by one if the attempts keep failing; lazy quantifiers try to get the shortest match by first reading in the empty string and then add in the characters one by one if the attempts keep failing; possessive quantifiers try the same way as greedy quantifiers while they will stop matching if the first attempt fails.

但是,我不确定上述内容是如何在内部"实施的,希望得到澄清(希望有示例).

However, I'm not sure how exactly the aboves are being implemented 'internally', and would like to ask for clarification (hopefully with examples).

例如,假设我们的输入字符串为 "fooaaafoooobbbfoo".

For example, say we have the input string as "fooaaafoooobbbfoo".

如果正则表达式是"foo.*"(贪婪),正则表达式中的foo是否会首先匹配正则表达式中的foo输入字符串,然后 .* 读入 aaafooooobbbfoo 作为整个字符串"?或者 .* 首先将 fooaaafoooobbbfoo 读入整个字符串",然后截断 fooaaafoooobbbbfoo 以尝试匹配 foo在正则表达式中?如果是后者,在每次尝试时 fooaaafooooobbbfoo 会从它的左边还是从它的右边被截断?

If the regex is "foo.*" (greedy), will the foo in the regex first match the foo in the input string, and then .* reads in aaafoooobbbfoo as 'the entire string'? Or will .* first read in fooaaafoooobbbfoo as 'the entire string', and then truncates fooaaafoooobbbfoo to try matching the foo in the regex? If it is the latter, will fooaaafoooobbbfoo be truncated from its left or from its right in each attempt?

如果我将 "foo.*" 替换为 ".*foo""foo.*foo" 作为我的正则表达式?如果我把那些贪婪的量词改成懒惰的量词和占有的量词呢?

Will the answers to the above questions change if I replace "foo.*" with ".*foo" or "foo.*foo" as my regex? What about if I change those greedy quantifiers to lazy ones and possessive ones?

如果正则表达式中有多个量词,引擎将如何处理优先级(如果这很重要)?

And if there are more than one quantifiers in a regex, how will the engine deal with the priority (if that matters)?

提前致谢!

推荐答案

对于你的输入字符串 fooaaafoooobbbfoo.

For your input string fooaaafoooobbbfoo.

案例 1:当您使用此正则表达式时:

Case 1: When you're using this regex:

foo.*

首先记住这个事实,即引擎从左到右遍历.

First remember this fact that engine traverses from left to right.

考虑到这一点,上面的正则表达式将首先匹配输入开始处的 foo,然后 .* 将尽可能匹配 longest匹配 foo 直到结束的其余文本.此时匹配停止,因为在您的模式中 .* 之后没有任何可匹配的内容.

With that in mind above regex will match first foo which is at the start of input and then .* will greedily match longest possible match which is rest of the text after foo till end. At this point matching stops as there is nothing to match after .* in your pattern.

情况 2:当您使用此正则表达式时:

Case 2: When you're using this regex:

.*foo

这里再次 .* 将在匹配最后一个 foo 之前贪婪地匹配 longest 可能的匹配,这正是输入的结尾.

Here again .* will greedily match longest possible match before matching last foo which is right the end of input.

案例 3:当您使用此正则表达式时:

Case 3: When you're using this regex:

foo.*foo

哪个将匹配在输入中找到的第一个 foo ,即在开始时 foo 然后 .* 将贪婪地匹配 longest 匹配最后一个 foo 之前的可能匹配,它正好是输入的结尾.

Which will match first foo found in input i.e. foo at the start then .* will greedily match longest possible match before matching last foo which is right the end of input.

案例 4:当你使用这个带有惰性量词的正则表达式时:

Case 4: When you're using this regex with lazy quantifier:

foo.*?foo

哪个将匹配第一个在输入中找到的 foo ,即 foo 在开头然后 .*? 将懒惰地匹配 shortest 匹配下一个 foo 之前的可能匹配,它是 foo 的第二个实例,从输入中的 6 位置开始.

Which will match first foo found in input i.e. foo at the start then .*? will lazily match shortest possible match before matching next foo which is second instance of foo starting at position 6 in input.

案例 5:当您将此正则表达式与所有格量词一起使用时:

Case 5: When you're using this regex with possessive quantifier:

foo.*+foo

哪个将匹配在输入中找到的第一个 foo ,即 foo 在开头然后 .*+ 使用 占有量词 表示匹配尽可能多的次数,不回馈.这将贪婪地匹配最长可能的匹配直到结束,并且由于所有格量词不允许引擎回溯,因此部分末尾的 foo 将导致失败,因为引擎将失败匹配最后一个 foo.

Which will match first foo found in input i.e. foo at the start then .*+ is using possessive quantifier which means match as many times as possible, without giving back. This will match greedily longest possible match till end and since possessive quantifier doesn't allow engine to backtrack hence presence of foo at the end of part will cause failure as engine will fail to match last foo.

这篇关于贪婪/懒惰(非贪婪)/占有量词在内部如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆