正则表达式后视不适用于量词(“+"或“*") [英] Regular Expression Lookbehind doesn't work with quantifiers ('+' or '*')

查看:61
本文介绍了正则表达式后视不适用于量词(“+"或“*")的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在正则表达式中使用lookbehinds,但它似乎没有按我预期的那样工作.所以,这不是我真正的用法,但为了简化我举个例子.想象一下,我想在一个写着这是一个例子"的字符串上匹配例子".所以,根据我对lookbehinds的理解,这应该可行:

I am trying to use lookbehinds in a regular expression and it doesn't seem to work as I expected. So, this is not my real usage, but to simplify I will put an example. Imagine I want to match "example" on a string that says "this is an example". So, according to my understanding of lookbehinds this should work:

(?<=this\sis\san\s*?)example

这应该做的是找到this is an",然后是空格字符,最后匹配单词example".现在,它不起作用,我不明白为什么,在lookbehinds中不能使用'+'或'*'吗?

What this should do is find "this is an", then space characters and finally match the word "example". Now, it doesn't work and I don't understand why, is it impossible to use '+' or '*' inside lookbehinds?

我也试过这两个,它们工作正常,但不能满足我的需求:

I also tried those two and they work correctly, but don't fulfill my needs:

(?<=this\sis\san\s)example
this\sis\san\s*?example

我使用这个网站来测试我的正则表达式:http://gskinner.com/RegExr/

I am using this site to test my regular expressions: http://gskinner.com/RegExr/

推荐答案

许多正则表达式库只允许在后视断言中使用严格表达式,例如:

Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:

  • 只匹配固定长度的字符串:(?<=foo|bar|\s,\s)(每个三个字符)
  • 只匹配固定长度的字符串:(?<=foobar|\r\n)(每个分支固定长度)
  • 只匹配具有上限长度的字符串:(?<=\s{,4})(最多四次重复)
  • only match strings of the same fixed length: (?<=foo|bar|\s,\s) (three characters each)
  • only match strings of fixed lengths: (?<=foobar|\r\n) (each branch with fixed length)
  • only match strings with a upper bound length: (?<=\s{,4}) (up to four repetitions)

产生这些限制的原因主要是因为这些库根本无法反向处理正则表达式,或者只能处理有限的子集.

The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.

另一个原因可能是避免作者构建过于复杂的正则表达式,因为他们有一个所谓的病理行为(另见ReDoS).

Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).

另见关于后视断言限制的部分Regular-Expressions.info.

这篇关于正则表达式后视不适用于量词(“+"或“*")的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆