从表达式树中解析和提取属性所需的正则表达式帮助 [英] Regex help needed to parse and extract property from an expression tree

查看:60
本文介绍了从表达式树中解析和提取属性所需的正则表达式帮助的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个有效的属性树表达式(它可以是递归的):

Here is a valid property tree expression (it can be recursive):

rootProperty:(prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc)

rootProperty:(prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc)

因此实际上一个属性可以有许多属性和子属性.从这个表达式中,我想捕获以下内容:

So in effect a property can have many properties and sub-properties. From this expression I would like to capture the following:

  • 根属性
  • prop1
  • prop2
  • subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3)
  • prop3

我尝试了几种方法,但无法以递归方式进行重复.因此寻求帮助.

I tried few approaches but could't get the repetitions working recursively. Hence seeking help.

谢谢坎南

推荐答案

由于递归(平衡括号),这不是正则语言,因此正则表达式可能不是您所需要的.但假设您知道自己在做什么:

This is not a regular language due to recursion (balanced parens), so a regular expression might not be what you need. But assuming you know what you are doing:

([^:(), ]+)(?::\(((?R)?(?:, ?(?R))*)\))?

首先我们捕获属性的名称:一个或多个不是 :(), 的字符.

First we capture the name of the property: one or more characters that are not :(), .

([^:(), ]+)

一个属性可能有也可能没有子树,所以下一部分是可选的子树:

A property may or may not have a subtree, so the next part is the optional subtree:

(?:           <--- do not capture
   :          <--- literal ':'
   \(         <--- literal '('
      ...     <--- some stuff inside
   \)         <--- literal ')'
)?            <--- it is optional

里面的东西捕获了一个属性列表:

The stuff inside captures a list of properties:

(             <--- do capture
 (?R)         <--- recursively match a property
 (?:          <--- do not capture
    , ?       <--- comma followed by optional space
    (?R)      <--- recursively match another property
 )*           <--- any number of comma separated properties
)             <--- end capture

对于您的示例输入:

Input:
    rootProperty:(prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc)
Match 1:
    rootProperty:(prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc)
    Group 1:
        rootProperty
    Group 2:
        prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc

然后您可以递归匹配每个匹配项的第二组以捕获子树的属性.应该有一种方法可以获取回溯信息,这样您就不需要这样做,但我不知道如何.

You could then recursively match the second group of each match for capturing the properties of a subtree. There should be a way to get the backtracking information so you don't need to do this, but I don't know how.

Input:
    prop1, prop2, subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3), prop3, etc
Match 1:
    prop1
Match 2:
    prop2
Match 3:
    subProp1:(prop1,subSubProp1:(prop1,prop2,etc),prop3)
    Group 1:
        subProp1
    Group 2:
        prop1,subSubProp1:(prop1,prop2,etc),prop3
Match 4:
    prop3
Match 5:
    etc

那么,

Input:
    prop1,subSubProp1:(prop1,prop2,etc),prop3
Match 1:
    prop1
Match 2:
    subSubProp1:(prop1,prop2,etc)
    Group 1:
        subSubProp1
    Group 2:
        prop1,prop2,etc
Match 3:
    prop3

最后:

Input:
    prop1,prop2,etc
Match 1:
    prop1
Match 2:
    prop2
Match 3:
    etc

https://regex101.com/r/WAXrFd/2

这篇关于从表达式树中解析和提取属性所需的正则表达式帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆