正则表达式来解析任意深度的功能 [英] Regex to parse functions with arbitrary depth

查看:174
本文介绍了正则表达式来解析任意深度的功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我解析一个简单的语言(Excel公式)为其中所包含的功能。函数名称必须与任何字母,后面跟上任意数量的字母/数字,并以开放的括号(中间无空格)结束。例如 MYFUNC(,此功能可以包含任何参数,包括其他的功能,并且必须以密切括号结束。当然,括号内的数学允许 = MYFUNC((1 + 1))(1 + 1)不该'T是因为它没有我刚才所描述的功能规则检测为一个功能。我的目标是要认识到最高水平的函数调用的公式,找出函数名,提取参数。随着争论,我可以递归地寻找其他的函数调用。

使用此教程我砍死了下面的正则表达式。无似乎做的伎俩。他们都失败的测试用例粘贴下面。

本的应该的工作,但完全失败:

<$p$p><$c$c>(?<name>[a-z][a-z0-9]*\()(?<body>(?>[a-z][a-z0-9]*\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)

这适用于许多测试案例,但失败下面的测试案例。我不认为它处理嵌套函数法正确,它只是看起来开放括号/排料接近括号:

<$p$p><$c$c>(?<name>[a-z][a-z0-9]*\()(?<body>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)

下面是打破他们所有的测试:

<$p$p><$c$c>=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year(A$5),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1

这应该匹配为:

 日期(ARGUMENTS1)
平日(ARGUMENTS2)
其中ARGUMENTS2 =日期(年(A $ 5),月(A $ 5),1)
 

相反,它匹配:

  ARGUMENTS2 =日期(年(A $ 5),月(A $ 5),1)-1)
 

我使用.NET正则表达式,它提供了对外部存储器。

解决方案

这是完全在.NET正则表达式的能力。这里有一个工作演示:

 使用系统;
使用System.Text.RegularEx pressions;

命名空间测试
{
  类测试
  {
    公共静态无效的主要()
    {
      正则表达式R =新的正则表达式(@
        (小于?名称&gt; [A-Z] [一个-Z0-9] * \()
          (小于?体&GT;
            ?(大于
               \((&LT;深度GT;)
             |
               \)(小于?-depth&GT)
             |
               [^()] +
            )*
            (?(深度)(?!))
          )
        \),RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

      字符串公式= @"=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year((A$5+1)),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1";

      的foreach(在r.Matches赛米(公式))
      {
        Console.WriteLine({0} \ N,m.Value);
      }
    }
  }
}
 

输出:

的日期(年(A $ 5),月(A $ 5),1)

平日(日期(年((A $ 5 + 1)),月(A $ 5),1))

你的正则表达式的主要问题是,你是包括函数名称作为递归比赛的一部分 - 例如:

 名1(...名称2(...)...)
 

任何开放的括号,这是叫不上名字pceded $ P $没有被计算在内,因为它是匹配的最后一种选择, | <?/ code>),并扔了与近距离括号的平衡。这也意味着,你不能匹配像 = MYFUNC公式((1 + 1)),你在文中提到,但没有包括在本例中。 (我扔在一组额外的括号来证明的。)

编辑:这是与非显著,引用括号支持的版本:

 正则表达式R =新的正则表达式(@
    (小于?名称&gt; [A-Z] [一个-Z0-9] * \()
      (小于?体&GT;
        ?(大于
           \((&LT;深度GT;)
         |
           \)(小于?-depth&GT)
         |
           [^] +
         |
           [^()] +
        )*
        (?(深度)(?!))
      )
    \),RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
 

I'm parsing a simple language (Excel formulas) for the functions contained within. A function name must start with any letter, followed by any number of letters/numbers, and ending with an open paren (no spaces in between). For example MyFunc(. The function can contain any arguments, including other functions and must end with a close paren ). Of course, math within parens is allowed =MyFunc((1+1)) and (1+1) shouldn't be detected as a function because it fails the function rule I've just described. My goal is to recognize the highest level function calls in a formula, identify the function name, extract the arguments. With the arguments, I can recursively look for other function calls.

Using this tutorial I hacked up the following regexes. None seem to do the trick. They both fail on test case pasted below.

This should work but completely fails:

(?<name>[a-z][a-z0-9]*\()(?<body>(?>[a-z][a-z0-9]*\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)

This works for many test cases, but fails for test case below. I don't think it handles nested functions correctly- it just looks for open paren/close paren in the nesting:

(?<name>[a-z][a-z0-9]*\()(?<body>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)

Here's the test that breaks them all:

=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year(A$5),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1

This should be matched as:

Date(ARGUMENTS1)
Weekday(ARGUMENTS2)
Where ARGUMENTS2 = Date(Year(A$5),Month(A$5),1)

Instead it matches:

ARGUMENTS2 = Date(Year(A$5),Month(A$5),1)-1)

I am using .net RegEx which provides for external memory.

解决方案

This is well within the capabilities of .NET regexes. Here's a working demo:

using System;
using System.Text.RegularExpressions;

namespace Test
{
  class Test
  {
    public static void Main()
    {
      Regex r = new Regex(@"
        (?<name>[a-z][a-z0-9]*\()
          (?<body>
            (?>
               \((?<DEPTH>)
             |
               \)(?<-DEPTH>)
             |
               [^()]+
            )*
            (?(DEPTH)(?!))
          )
        \)", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

      string formula = @"=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year((A$5+1)),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1";

      foreach (Match m in r.Matches(formula))
      {
        Console.WriteLine("{0}\n", m.Value);
      }
    }
  }
}

output:

Date(Year(A$5),Month(A$5),1)

Weekday(Date(Year((A$5+1)),Month(A$5),1))

The main problem with your regex was that you were including the function name as part of the recursive match--for example:

Name1(...Name2(...)...)

Any open-paren that wasn't preceded by name was not counted, because it was matched by the final alternative, |.?), and that threw off the balance with the close-parens. That also meant that you couldn't match formulas like =MyFunc((1+1)), which you mentioned in the text but didn't include in the example. (I threw in an extra set of parens to demonstrate.)

EDIT: Here's the version with support for non-significant, quoted parens:

  Regex r = new Regex(@"
    (?<name>[a-z][a-z0-9]*\()
      (?<body>
        (?>
           \((?<DEPTH>)
         |
           \)(?<-DEPTH>)
         |
           ""[^""]+""
         |
           [^()""]+
        )*
        (?(DEPTH)(?!))
      )
    \)", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

这篇关于正则表达式来解析任意深度的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆