可以使用条件来平衡组元素吗? [英] Can conditionals be used to pair balance group elements?

查看:43
本文介绍了可以使用条件来平衡组元素吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL;DR:有没有办法指定一个条件,以便开始元素必须与其配对的结束元素匹配?

TL;DR: Is there a way to specify a conditional so that an opening element MUST match its paired closing element?

示例位于 在 regex101.com 上.

======

正则表达式中的平衡元素通常通过递归处理.这意味着可以定位嵌套的 {...{...{...}...}...}.

Balancing elements in regex is typically handled through recursion. This means that nested {...{...{...}...}...} can be located.

此外,PCRE 允许使用 (?(DEFINE)...) 构造,它允许您定义各种模式而无需实际开始匹配.

Also, PCRE allows the (?(DEFINE)...) construct, which lets you define various patterns without actually starting the match.

在正则表达式中

# Define the opening and closing elements before the recursion occurs
(?(DEFINE)
  (?<open_curly>\{)
  (?<close_curly>\})
  # ... other definitions here ...

  (?<open>\g'open_curly')
  (?<close>\g'close_curly')
)

# Match the opening element
(\g'open'
  (?>
    # For recursion, don't match either the opening or closing element
    (?!\g'open'|\g'close')(?s:.)
  |
    # Recurse this captured pattern
    (?-1)
  )*

# Match the closing element
\g'close')

元素是{}字符,可以匹配

the elements are the { and } characters, and can match against patterns such as

{{{}}}
{ test1 { test2 { test3 { test4 } } } }

我想包含其他打开/关闭元素,例如 [],或 --[--],所以包括在 (?(DEFINE)) 中:

I want to include other open/close elements, such as [ and ], or --[ and --], so include those in the (?(DEFINE)):

(?<open_square>\[)
(?<close_square>\])
(?P<open_pascal>(?i:\bbegin\b))
(?P<close_pascal>(?i:\bend\b))
(?P<open_lua>--\[)
(?P<close_lua>--\])
(?<open>\g'open_curly'|\g'open_square'|\g'open_pascal'|\g'open_lua')
(?<close>\g'close_curly'|\g'close_square'|\g'close_pascal'|\g'close_lua')

这样做的不正确之处是将开始元素与其结束元素配对,允许 --[} 分组,这是不可取的.

What this DOESN'T do correctly is to pair the opening element with its closing element, allowing --[ to group with }, which is not desirable.

有没有办法在这样的正则表达式中创建开/关对?

Is there a way to create open/close pairs in a regex like this?

推荐答案

我会说用一堆命名组和
污染逻辑是没有用的不可预测的不必要的递归.

I would say that there is no use polluting the logic with a bunch of named groups and
unnecessary recursion that is unpredictable.

维护正确的递归有三个主要部分(如下所列).
要做到正确,您必须解析所有内容,因此您必须考虑
内容和不平衡的错误.

There are three main parts to maintaining a proper recursion (listed below).
To do it right, you must parse everything, so you have to take into account
content and unbalanced errors.

引擎不会让你捕捉细节,任何超出第一级的东西.
这意味着可以更轻松地维护一个 TOP(您可以从中提取信息)和一个 CORE
您无法维护,但会进行递归.一些细微的差异,但可以看到
在下面的例子中.

The engine won't let you capture in detail, anything beyond the first level.
That means its easier to maintain a TOP that you can pull info from and a CORE
that you can't maintain, but does the recursion. A few subtle differences but can be seen
in the below examples.

任何时候核心被递归,它就会立即被一个人包围
设置(对)唯一的分隔符.这是为了正确展开堆栈.
这个过程不能被分解(概括).

Anytime a core is recursed, it is immediatly surrounded by an individual
set (pair) of unique delimiters. This is to unwind the stack properly.
This process can't be factored out (generalized).

更新
通常这个正则表达式在递归函数中被调用,每次都将 CORE 传递给它.
示例伪代码:

update
Usually this regex is called within a recursive function, passing the CORE to it each time.
Example pseudo-code:

bool bIsOk = true;
bool RecurseCore( string core )
{  
     while( regex_search ( core, regex, match ) )
     {
          if ( match[1].success ) { print 'content' }
          else
          if ( match[2].success ) { print 'square'; bIsOk = RecurseCore( match[2].value ) }
          else
          if ( match[3].success ) { print 'curly'; bIsOk = RecurseCore( match[3].value ) }
          else
          if ( match[4].success ) { print 'pascal'; bIsOk = RecurseCore( match[4].value )  }
          else
          if ( match[5].success ) { print 'lua'; bIsOk = RecurseCore( match[5].value )   }
          else
          if ( match[6].success ) { print 'error'; bIsOk = false } // error
          if ( bIsOk == false ) { break }
     }
     return bIsOk;
 }

正则表达式:

 # //////////////////////////////////////////////////////
 # // The General Guide to 3-Part Recursive Parsing
 # // ----------------------------------------------
 # // Part 1. CONTENT
 # // Part 2. CORE
 # // Part 3. ERRORS

 (?si)                      # Dot all, no case

 (?:
      (                          # (1), Take off CONTENT
           (?&content) 
      )
   |                           # OR
      \[                         # Square's delimiter
      (                          # (2), square CORE
           (?= . )
           (?&core) 
        |  
      )
      \]                         # End-Delimiter
   |                           # OR
      \{                         # Curly's delimiter
      (                          # (3), curly CORE
           (?= . )
           (?&core) 
        |  
      )
      \}                         # End-Delimiter
   |                           # OR
      \b begin \b                # Pascal's delimiter
      (                          # (4), pascal CORE
           (?= . )
           (?&core) 
        |  
      )
      \b end \b                  # End-Delimiter
   |                           # OR  
      --\[                       # Lua's delimiter
      (                          # (5), lua CORE
           (?= . )
           (?&core) 
        |  
      )
      --\]                       # End-Delimiter
   |                           # OR
      (                          # (6), Take off Unbalanced (delimeter) ERRORS
           \b 
           (?: begin | end )
           \b 
        |  -- [\[\]] 
        |  [\[\]{}] 
      )
 )

 # ///////////////////////
 # // Subroutines
 # // ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                \[                         
                (?:                        # Square delimiter
                     (?= . )                    # recurse core
                     (?&core)                   
                  |  
                )
                \]
             |                           # OR
                \{
                (?:                        # Curly delimiter
                     (?= . )                    # recurse core 
                     (?&core) 
                  |  
                )
                \}      
             |                           # OR
                \b begin \b 
                (?:                        # Pascal delimiter
                     (?= . )                    # recurse core 
                     (?&core) 
                  |  
                )
                \b end \b     
             |                           # OR
                --\[
                (?:                        # Lua delimiter
                     (?= . )                    # recurse core 
                     (?&core) 
                  |  
                )
                --\]      
           )+
      )

      # content 
      (?<content>
           (?>
                (?!
                     \b 
                     (?: begin | end )
                     \b 
                  |  -- [\[\]] 
                  |  [\[\]{}] 
                )
                . 
           )+
      )

 )

这篇关于可以使用条件来平衡组元素吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆