正则表达式‘(?< =#)[^#] +(?=#)’如何工作? [英] How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

查看:160
本文介绍了正则表达式‘(?< =#)[^#] +(?=#)’如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C#程序中具有以下正则表达式,但很难理解它:

I have the following regex in a C# program, and have difficulties understanding it:

(?<=#)[^#]+(?=#)

我将其分解为我认为的理解:

I'll break it down to what I think I understood:

(?<=#)    a group, matching a hash. what's `?<=`?
[^#]+     one or more non-hashes (used to achieve non-greediness)
(?=#)     another group, matching a hash. what's the `?=`?

所以我遇到的问题是?<=?<部分.通过阅读MSDN,?<name>用于命名组,但是在这种情况下,尖括号永远不会关闭.

So the problem I have is the ?<= and ?< part. From reading MSDN, ?<name> is used for naming groups, but in this case the angle bracket is never closed.

我在文档中找不到?=,搜索起来真的很困难,因为搜索引擎通常会忽略那些特殊字符.

I couldn't find ?= in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.

推荐答案

它们称为环顾四周;它们允许您断言某个模式是否匹配,而无需实际进行匹配.有4种基本的解决方法:

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

  • 积极的解决方法:看看我们是否可以匹配pattern ...
    • (?=pattern)-...到当前位置的(向前看 )
    • (?<=pattern)-...到当前位置的(在后面)
    • Positive lookarounds: see if we CAN match the pattern...
      • (?=pattern) - ... to the right of current position (look ahead)
      • (?<=pattern) - ... to the left of current position (look behind)
      • (?!pattern)-...到
      • (?<!pattern)-...到
      • (?!pattern) - ... to the right
      • (?<!pattern) - ... to the left

      为方便起见,请环顾四周:

      As an easy reminder, for a lookaround:

      • =阳性!阴性
      • <看起来在后面,否则看起来在前面
      • = is positive, ! is negative
      • < is look behind, otherwise it's look ahead

      有人可能会争辩说不需要在上述模式中进行四处查找,并且#([^#]+)#可以很好地完成工作(提取\1捕获的字符串以获取非#的字符串).

      One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).

      不完全是.区别在于,由于环顾四周与# 不匹配,因此下次尝试查找匹配项时,它可以再次使用".简单地说,环顾四周允许匹配项"重叠.

      Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.

      考虑以下输入字符串:

      and #one# and #two# and #three#four#
      

      现在,#([a-z]+)#将给出以下匹配项(如在rubular.com上看到的那样 ) :

      Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

      and #one# and #two# and #three#four#
          \___/     \___/     \_____/
      

      将此与(?<=#)[a-z]+(?=#)进行比较,该匹配将匹配:

      Compare this with (?<=#)[a-z]+(?=#), which matches:

      and #one# and #two# and #three#four#
           \_/       \_/       \___/ \__/
      

      不幸的是,这不能在rubular.com上得到证明,因为它不支持向后看.但是,它确实支持前瞻性,因此我们可以使用#([a-z]+)(?=#)做类似的事情,它匹配(在rubular上看到的.com ):

      Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):

      and #one# and #two# and #three#four#
          \__/      \__/      \____/\___/
      

      参考文献

      • regular-expressions.info/Flavor比较
      • References

        • regular-expressions.info/Flavor Comparison
        • 这篇关于正则表达式‘(?&lt; =#)[^#] +(?=#)’如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆