懒惰的量词{,}?不能按我期望的那样工作 [英] Lazy quantifier {,}? not working as I would expect

查看:82
本文介绍了懒惰的量词{,}?不能按我期望的那样工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对懒惰的量词有疑问.或者很可能是我误解了我应该如何使用它们.

I have an issue with lazy quantifiers. Or most likely I misunderstand how I am supposed to use them.

Regex101 上进行测试 我的测试字符串是说:123456789D123456789

Testing on Regex101 My test string is let's say: 123456789D123456789

.{1,5}匹配12345

.{1,5}?匹配1

两次比赛我都很好.

.{1,5}?D匹配56789D !!我希望它与9D

.{1,5}?D matches 56789D !! I would expect it to match 9D

感谢您对此进行澄清.

推荐答案

首先,请不要将正则表达式中的贪婪和懒惰视为获取最长/最短匹配项的方法. 贪婪的"和懒惰的"术语仅与模式可以匹配的最右边的字符有关,它对最左边的字符没有任何影响.当您使用懒惰的量词时,它将确保匹配的子字符串的末尾是第一个找到的子字符串,而不是最后一个找到的子字符串(将与贪婪的量词一起返回).

First and foremost, please do not think of greediness and laziness in regex as means of getting the longest/shortest match. "Greedy" and "lazy" terms only pertain to the rightmost character a pattern can match, it does not have any impact on the leftmost one. When you use a lazy quantifier, it will guarantee that the end of your matched substring will be the first found one, not the last found one (that would be returned with a greedy quantifier).

正则表达式引擎从左到右分析一个字符串.因此,它将搜索符合模式的第一个字符,然后,找到匹配的子字符串,则将其作为匹配项返回.

The regex engine analyzes a string from left to right. So, it searches for the first character that meets the pattern and then, once it finds the matching substring, it is returned as a match.

让我们看看如何使用.{1,5}D解析字符串:找到1并测试了D.在找到1之后没有找到D,正则表达式引擎将延迟量词扩展并匹配12,然后尝试匹配D.在2之后是3,引擎再次扩展了惰性点并将其执行5次.扩展到最大值后,它会看到存在12345并且下一个字符不是D.由于引擎达到了最大限制量值,因此匹配失败,因此将测试下一个位置.

Let's see how it parses the string with .{1,5}D: 1 is found and D is tested for. No D after 1 is found, the regex engine expands the lazy quantifier and matches 12 and tries to match D. There is 3 after 2, again, the engine expands the lazy dot and does it 5 times. After expanding to the max value, it sees there is 12345 and the next character is not D. Since the engine reached the max limiting quantifier value, the match is failed, next location is tested.

相同的情况也会发生,直到位置5.当引擎到达5时,它尝试匹配5D,失败,尝试56D,失败,567D,失败,5678D-再次失败,并且尝试匹配56789D-宾果游戏! -找到了匹配项.

The same scenario happens with the locations up to 5. When the engine reaches 5, it tries to match 5D, fails, tries 56D, fails, 567D, fails, 5678D - fails again, and when it tries to match 56789D - Bingo! - the match is found.

这很清楚,默认情况下,模式开头的懒散量化子模式将贪婪地"运行,也就是说,它将不匹配最短的子字符串.

This makes it clear that a lazily quantified subpattern at the beginning of a pattern will act "greedily" by default, that is, it will not match the shortest substring.

这是来自 regex101.com 的可视化文件:

Here is a visualization from regex101.com:

现在,这是一个有趣的事实:模式结尾处的.{1,5}?将始终匹配1个字符(如果有),因为要求至少匹配1个字符,并且足以返回有效的匹配.因此,如果您编写 D.{1,5}? ,您将在123456789D12345D678904.

Now, here is a fun fact: .{1,5}? at the end of the pattern will always match 1 character (if there is any) because the requirement is to match at least 1, and it is sufficient to return a valid match. So, if you write D.{1,5}?, you will get D1 and D6 in 123456789D12345D678904.

趣味事实2 :在.NET中,您可以借助RightToLeft修饰符向"正则表达式引擎询问"以从右到左分析字符串.然后,使用.{1,5}?D,您将获得9D,请参见

Fun Fact 2: In .NET, you can "ask" the regex engine to analyze the string from right to left with the help of RightToLeft modifier. Then, with .{1,5}?D, you will get 9D, see this demo.

有趣的事实3 :在.NET中,如果将123456789D作为输入传递,则(?<=(.{1,5}?))D会将9捕获到组1中.发生这种情况的原因是,在.NET regex中实现了 lookbehind的方式(.NET将字符串以及后面的内部模式,然后尝试匹配反向字符串上的单个模式).在Java中,(?<=(.{1,5}))D(贪婪版本)将捕获9,因为它会尝试从最短到最长的范围内的所有可能的固定宽度模式,直到成功为止.

Fun fact 3: In .NET, (?<=(.{1,5}?))D will capture 9 into Group 1 if 123456789D is passed as input. This happens because of the way the lookbehind is implemented in .NET regex (.NET reverses the string as well as the pattern inside the lookbehind, then attempts to match that single pattern on the reversed string). And in Java, (?<=(.{1,5}))D (the greedy version) will capture 9 because it tries all the possible fixed-width patterns in the range, from the shortest to the longest, until one succeeds.

一种解决方法是:如果您知道需要1个字符,然后再按D,只需使用

And a solution is: if you know you need 1 character followed with D, just use

/.D/

这篇关于懒惰的量词{,}?不能按我期望的那样工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆