捕获没有特定数字跟随的数字流 [英] Capture stream of digits which is not followed by certain digits

查看:72
本文介绍了捕获没有特定数字跟随的数字流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想捕获没有特定数字跟随的数字流.例如

input = abcdef lookbehind 123456..... asjdnasdh lookbehind 789432

我只想使用否定超前捕获来捕获789432,而不是123 .

我尝试了(?<=lookbehind )([\d])+(?!456),但是它捕获了123456789432.

使用(?<=lookbehind )([\d])+?(?!456)仅捕获17.

对我来说,分组不是一种选择,因为我的用例不允许这样做.

有什么方法可以使用纯正则表达式捕获789432而不是123吗? 谢谢您对答案的解释.

解决方案

您可以使用带有负向后缀的所有格修饰符

(?<=lookbehind )\d++(?<!456)
                  ^^ ^^^^^^ 

请参见此regex演示.

带有原子组的同义模式:

(?<=lookbehind )(?>\d+)(?<!456)

详细信息

  • (?<=lookbehind )-与字符串中正好位于lookbehind
  • 前面的位置匹配的正向后看
  • \d++-占位符匹配的1+个数字,不允许回溯到模式中(引擎无法从与\d++匹配的任何数字中重试匹配)
  • (?<!456)-如果与\d++匹配的最后3位数字是456,则负向后检查将使匹配失败.

为什么往后看,为什么不往前看

(?<!...)后面的负向外观确保某个模式与当前位置的左侧不立即匹配.如果负向超前(?!...)的模式立即与当前位置的右侧匹配,则匹配失败.这里的失败"表示正则表达式引擎放弃了当前匹配字符串的方式,如果在后向/超前查找之前存在量化模式,则引擎可能会 backtrack 进入那些模式以尝试以不同方式匹配字符串.请注意,这里的所有格修饰符使引擎无法多次执行456的后向检查,只有用\d++抓住所有数字后才执行.

(?<=lookbehind )([\d])+(?!456)正则表达式匹配123456,因为\d+以贪婪的方式匹配这些数字(一次全部匹配),并且(?!456)在它们之后检查456,并且因为那里没有456 ,则返回匹配项. (?<=lookbehind )([\d])+?(?!456)仅匹配一位,因为\d+?以惰性方式匹配,匹配1位,然后执行loolahead检查.由于1之后没有456,因此将返回1.

为什么++所有格量词

如果以前存在量化模式,则不允许正则表达式引擎以不同的方式重试匹配字符串.因此, (?<=lookbehind )\d+(?<!456) 匹配123456中的12345,因为没有4566之前.

I wanted to capture a stream of digits which are not followed by certain digits. For example

input = abcdef lookbehind 123456..... asjdnasdh lookbehind 789432

I want to capture 789432 and not 123 using negative lookahead only.

I tried (?<=lookbehind )([\d])+(?!456) but it captures 123456 and 789432.

Using (?<=lookbehind )([\d])+?(?!456) captures only 1 and 7.

Grouping is not an option for me as my use case doesn't allow me to do it.

Is there any way I can capture 789432 and not 123 using pure regex? An explanation for the answer is appreciated.

解决方案

You may use a possessive quantifier with a negative lookbehind

(?<=lookbehind )\d++(?<!456)
                  ^^ ^^^^^^ 

See this regex demo.

A synonymous pattern with an atomic group:

(?<=lookbehind )(?>\d+)(?<!456)

Details

  • (?<=lookbehind ) - a positive lookbehind that matches a location in string that is immediately preceded with lookbehind
  • \d++ - 1+ digits matched possessively, allowing no backtracking into the pattern (the engine cannot retry matching from any digit matched with \d++)
  • (?<!456) - a negative lookbehind check that fails the match if the last 3 digits matched with \d++ are 456.

Why lookbehind and why not lookahead

The negative lookbehind (?<!...) makes sure that a certain pattern does not match immediately to the left of the current location. A negative lookahead (?!...) fails the match if its pattern matches immediately to the right of the current location. "Fail" here means that the regex engine abandons the current way of matching a string, and if there are quantified patterns before the lookbehind/lookahead the engine might backtrack into those patterns to try and match a string differently. Note that here, a possessive quantifier makes it impossible for the engine to perform the lookbehind check for 456 multiple times, it is only executed once all the digits are grabbed with \d++.

You (?<=lookbehind )([\d])+(?!456) regex matches 123456 because the \d+ matches these digits in a greedy way (all at once) and (?!456) checks for 456 after them, and since there are no 456 there, the match is returned. The (?<=lookbehind )([\d])+?(?!456) matches only one digit because \d+? matches in a lazy way, 1 digit is matched and then the loolahead check is performed. Since there is no 456 after 1, 1 is returned.

why ++ possessive quantifier

It does not allow a regex engine to retry matching a string differently if there are quantified patterns before. So, (?<=lookbehind )\d+(?<!456) matches 12345 in 123456 as there is no 456 before 6.

这篇关于捕获没有特定数字跟随的数字流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆