为什么这很简单。*?非贪婪的正则表达式贪婪? [英] Why is this simple .*? non-greedy regex being greedy?

查看:100
本文介绍了为什么这很简单。*?非贪婪的正则表达式贪婪?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的正则表达式:

I have a very simple regex similar to this:

HOHO。*?_ HO _

使用此测试字符串...

With this test string...

fiwgu_HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO ___ HO_fbguyev


  • 我希望它只匹配 _HOHO ___ HO _ (最短匹配,非贪婪)

  • 相反,它匹配 _HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO ___ HO _ (最长的匹配,看起来很贪婪)。

  • I expect it to match just _HOHO___HO_ (shortest match, non-greedy)
  • Instead it matches _HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO___HO_ (longest match, looks greedy).

为什么?如何使其与最短匹配匹配?

Why? How can I make it match the shortest match?

添加和删除会得到相同的结果。

Adding and removing the ? gives the same result.

编辑 - 更好的测试字符串,显示为什么 [^ HOHO] 没有工作: fiwgu_HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO_H_O_H_O_HO_fbguye

Edit - better test string that shows why [^HOHO] doesn't work: fiwgu_HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO_H_O_H_O_HO_fbguye

所有我能想到的可能是它匹配多次 - 但只有一个匹配 _HO _ ,所以我不明白为什么它没有采取以<$ c $结尾的最短匹配c> _HO _ ,丢弃其余部分。

All I can think of is that maybe it is matching multiple times - but there's only one match for _HO_, so I don't understand why it isn't taking the shortest match that ends at the _HO_, discarding the rest.

我浏览了所有可以找到的问题,例如非贪婪的正则表达式行为贪婪 ,但他们似乎都有其他一些问题。

I've browsed all the questions I can find with titles like "Non-greedy regex acts greedy", but they all seem to have some other problem.

推荐答案

我在Regex lazy vs greedy confusion

In正则引擎引擎类似于Javascript使用的引擎(我相信NFA引擎,非贪婪只能让你从第一个左手比赛中获得从左到右最短的比赛适合最近的右手比赛。

In regex engines like the one used by Javascript (NFA engines I believe), non-greedy only gives you the match that is shortest going left to right - from the first left-hand match that fits to the nearest right-hand match.

如果一场右手比赛有很多左手比赛,那么它将始终从它到达的第一场比赛开始(这实际上会给出 最长 匹配)。

Where there are many left-hand matches for one right-hand match, it will always go from the first it reaches (which will actually give the longest match).

基本上,它一次通过字符串一个字符询问这个字符是否匹配?如果是,匹配最短并完成。如果不是,请转到下一个字符,重复。我期望它是在这个字符串中是否有匹配?如果是,匹配所有这些中最短的。

Essentially, it goes through the string one character at a time asking "Are there matches from this character? If so, match the shortest and finish. If no, move to next character, repeat". I expected it to be "Are there matches anywhere in this string? If so, match the shortest of all of them".

您可以通过替换来替换一个非贪婪的正则表达式,其中的否定意味着不是左侧匹配。要否定这样的字符串需要否定前瞻和非捕获组,但这就像将字符串放入(?:(?!)。)一样简单。例如,(?:(?!HOHO)。)

You can approximate a regex that is non-greedy in both directions by replacing the . with a negation meaning "not the left-side match". To negate a string like this requires negative lookaheads and non-capturing groups, but it's as simple as dropping the string into (?:(?!).). For example, (?:(?!HOHO).)

例如,相当于 HOHO。*?_ HO _ 左边和右边的非贪婪将是:

For example, the equivalent of HOHO.*?_HO_ which is non-greedy on the left and right would be:

HOHO (?:(?!HOHO)。)*?_ HO _

因此正则表达式引擎实际上是通过这样的每个字符:

So the regex engine is essentially going through each character like this:


  • HOHO - 这是否与左侧相符?

  • (?:(?!HOHO)。)* - 如果是这样,我可以到达右侧而不重复左侧?

  • _HO _ - 如果是这样,抓住所有东西直到右手匹配

  • 修饰符 * + - 如果有多个权利 - 手比赛,选择最近的一个

  • HOHO - Does this match the left side?
  • (?:(?!HOHO).)* - If so, can I reach the right-hand side without any repeats of the left side?
  • _HO_ - If so, grab everything until the right-hand match
  • ? modifier on * or + - If there are multiple right-hand matches, choose the nearest one

这篇关于为什么这很简单。*?非贪婪的正则表达式贪婪?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆