正则表达:谁更贪婪? [英] Regular expression: who's greedier?

查看:113
本文介绍了正则表达:谁更贪婪?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我主要关心的是Java风格,但我也很欣赏有关其他人的信息。

My primary concern is with the Java flavor, but I'd also appreciate information regarding others.

假设你有一个像这样的子模式:

Let's say you have a subpattern like this:

(.*)(.*)

不是很有用,但是让我们说这两个捕获组(例如, \1 \2 )是更大模式的一部分,与这些组的反向引用相匹配等。

Not very useful as is, but let's say these two capture groups (say, \1 and \2) are part of a bigger pattern that matches with backreferences to these groups, etc.

所以两者都是贪婪的,因为它们试图捕获为尽可能少,只需要少花钱。

So both are greedy, in that they try to capture as much as possible, only taking less when they have to.

我的问题是:谁更贪婪? \1 是否获得优先权,仅在必须时才提供 \2 其份额?

My question is: who's greedier? Does \1 get first priority, giving \2 its share only if it has to?

怎么样:

(.*)(.*)(.*)

我们假设 \1 确实得到了首要任务。让我们说它过于贪婪,然后吐出一个角色。谁先得到它?它总是 \2 还是可以 \3

Let's assume that \1 does get first priority. Let's say it got too greedy, and then spit out a character. Who gets it first? Is it always \2 or can it be \3?

让我们假设它是 \2 得到 \1 的拒绝。如果这仍然不起作用,谁现在吐出来?是 \2 吐到 \3 ,还是 \1 先将另一个吐出到 \2

Let's assume it's \2 that gets \1's rejection. If this still doesn't work, who spits out now? Does \2 spit to \3, or does \1 spit out another to \2 first?

如果你写这样的话怎么办:

What happens if you write something like this:

(.*)(.*?)(.*)

现在 \2 不情愿。这是否意味着 \1 吐出 \3 \2 只是不情愿地接受 \3 的拒绝?

Now \2 is reluctant. Does that mean \1 spits out to \3, and \2 only reluctantly accepts \3's rejection?

对我来说,如果没有给出具体的例子来说明我是如何使用这些模式的,那可能是个错误,但这里有一些:

Maybe it was a mistake for me not to give concrete examples to show how I'm using these patterns, but here's some:

System.out.println(
    "OhMyGod=MyMyMyOhGodOhGodOhGod"
    .replaceAll("^(.*)(.*)(.*)=(\\1|\\2|\\3)+$", "<$1><$2><$3>")
); // prints "<Oh><My><God>"

// same pattern, different input string
System.out.println(
    "OhMyGod=OhMyGodOhOhOh"
    .replaceAll("^(.*)(.*)(.*)=(\\1|\\2|\\3)+$", "<$1><$2><$3>")
); // prints "<Oh><MyGod><>"

// now \2 is reluctant
System.out.println(
    "OhMyGod=OhMyGodOhOhOh"
    .replaceAll("^(.*)(.*?)(.*)=(\\1|\\2|\\3)+$", "<$1><$2><$3>")
); // prints "<Oh><><MyGod>"


推荐答案

添加具体示例会改变其性质大问题。它仍然按照我在第一个答案中描述的那样开始,第一个(。*)吞噬所有角色,第二个和第三个组让它拥有它们,但是那么它必须匹配一个等号。

The addition of your concrete examples changes the nature of the question drastically. It still starts out as I described in my first answer, with the first (.*) gobbling up all the characters, and the second and third groups letting it have them, but then it has to match an equals sign.

显然字符串末尾没有一个,所以组#1逐个给出字符,直到 = 可以匹配目标中的 = 。然后正则表达式引擎开始尝试匹配(\1 | \2 | \ 3)+ $ 并开始真正的乐趣。

Obviously there isn't one at the end of the string, so group #1 gives back characters one by one until the = in the regex can match the = in the target. Then the regex engine starts trying to match (\1|\2|\3)+$ and the real fun starts.

第1组放弃 d 而第2组(仍为空)取得它,但剩下的正则表达式仍然无法比拟。第1组放弃 o ,第2组匹配 od ,但其余的正则表达式仍然无法匹配。因此,随着第三组的参与,他们三人以各种可能的方式切换输入,直到实现整体匹配。 RegexBuddy报告说到达那里需要13426步。

Group 1 gives up the d and group 2 (which is still empty) takes it, but the rest of the regex still can't match. Group 1 gives up the o and group 2 matches od, but the rest of the regex still can't match. And so it goes, with the third group getting involved, and the three of them slicing up the input in every way possible until an overall match is achieved. RegexBuddy reports that it takes 13,426 steps to get there.

在第一个例子中,贪婪(或缺乏贪婪)并不是真正的因素;唯一可以实现匹配的方法是单词我的 God 是在不同的组中捕获的,所以最终会发生这种情况。甚至不管哪个群体捕获哪个词 - 这只是先到先得,正如我之前所说的那样。

In the first example, greediness (or lack of it) isn't really a factor; the only way a match can be achieved is if the words Oh, My and God are captured in separate groups, so eventually that's what happens. It doesn't even matter which group captures which word--that's just first come, first served, as I said before.

在第二和第三个例子中,它只是必要的将前缀分成两个块: MyGod 。第2组在第二个例子中捕获 MyGod 因为它是下一个并且它是贪婪的,就像在第一个例子中一样。在第三个例子中,每当第1组丢弃一个角色时,第2组(不情愿)让第3组取而代之,那么最后拥有 MyGod

In the second and third examples it's only necessary to break the prefix into two chunks: Oh and MyGod. Group 2 captures MyGod in the second example because it's next in line and it's greedy, just like in the first example. In the third example, every time group 1 drops a character, group 2 (being reluctant) lets group 3 take it instead, so that's the one that ends up in possession of MyGod.

当然,这比那更复杂(和乏味),但我希望这能回答你的问题。我不得不说,这是你选择的一个有趣的目标字符串;如果一个正则表达式引擎有可能达到性高潮,我认为这些正则表达式会让它脱颖而出。 :D

It's more complicated (and tedious) than that, of course, but I hope this answers your question. And I have to say, that's an interesting target string you chose; if it were possible for a regex engine to have an orgasm, I think these regexes would be the ones to bring it off. :D

这篇关于正则表达:谁更贪婪?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆