Perl 正则表达式:匹配嵌套括号 [英] Perl regular expression: match nested brackets

查看:45
本文介绍了Perl 正则表达式:匹配嵌套括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将嵌套的 {} 括号与 Perl 中的正则表达式匹配,以便我可以从文件中提取某些文本片段.这是我目前拥有的:

my @matches = $str =~/{(?:{.*}|[^{])*}|w+/sg;foreach (@matches) {打印$_
";}

在某些时候,这会按预期工作.例如,如果 $str = "abc {{xyz} abc} {xyz}" 我得到:

<前>美国广播公司{{xyz} abc}{xyz}

正如预期的那样.但是对于其他输入字符串,它不会按预期运行.例如,如果 $str = "{abc} {{xyz}} abc",则输出为:

<前>{abc} {{xyz}}美国广播公司

这不是我所期望的.我本来希望 {abc}{{xyz}} 位于不同的行上,因为每个都在括号方面独立平衡.我的正则表达式有问题吗?如果是这样,我将如何修复它?

解决方案

你很惊讶你的模式如何匹配,但没有人解释它?以下是您的模式匹配方式:

my @matches = $str =~/{(?:{.*}|[^{])*}|w+/sg;^ ^ ^ ^ ^ ^||||||{ ---------------------+ |||||a --------------------------)-)-)--+ |b --------------------------)-)-)--+ |c --------------------------)-)-)--+ |} --------------------------)-)-)--+ |--------------------------)-)-)--+ |{ --------------------------+ |||{ -----------------------------+ ||x ----------------------------+ ||y -----------------------------+ ||z -----------------------------+ ||} ------------------------------+ |} ----------------------------------------+

如您所见,问题在于/{.*}/匹配太多.里面应该有什么匹配的东西

(?: s* (?: { ... } | w+ ) )*

... 在哪里

(?: s* (?: { ... } | w+ ) )*

所以你需要一些递归.命名组是一种简单的方法.

说 $1尽管/G s*+ ( (?&WORD) | (?&BRACKETED) )(?(定义)(? s* w+ )(? s* { (?&TEXT)? s* } )(? (?: (?&WORD) | (?&BRACKETED) )+ ))/xg;

但与其重新发明轮子,不如使用 Text::Balanced.

I'm trying to match nested {} brackets with a regular expressions in Perl so that I can extract certain pieces of text from a file. This is what I have currently:

my @matches = $str =~ /{(?:{.*}|[^{])*}|w+/sg;

foreach (@matches) {
    print "$_
";
}

At certain times this works as expected. For instance, if $str = "abc {{xyz} abc} {xyz}" I obtain:

abc
{{xyz} abc}
{xyz}

as expected. But for other input strings it does not function as expected. For example, if $str = "{abc} {{xyz}} abc", the output is:

{abc} {{xyz}}
abc

which is not what I expected. I would have wanted {abc} and {{xyz}} to be on separate lines, since each is balanced on its own in terms of brackets. Is there an issue with my regular expression? If so, how would I go about fixing it?

解决方案

You were surprised how your pattern matched, but noone explained it? Here's how your pattern is matching:

my @matches = $str =~ /{(?:{.*}|[^{])*}|w+/sg;
                       ^    ^ ^ ^  ^      ^
                       |    | | |  |      |
{ ---------------------+    | | |  |      |
a --------------------------)-)-)--+      |
b --------------------------)-)-)--+      |
c --------------------------)-)-)--+      |
} --------------------------)-)-)--+      |
  --------------------------)-)-)--+      |
{ --------------------------+ | |         |
{ ----------------------------+ |         |
x ----------------------------+ |         |
y ----------------------------+ |         |
z ----------------------------+ |         |
} ------------------------------+         |
} ----------------------------------------+

As you can see, the problem is that /{.*}/ matches too much. What should be in there is a something that matches

(?: s* (?: { ... } | w+ ) )*

where the ... is

(?: s* (?: { ... } | w+ ) )*

So you need some recursion. Named groups are an easy way of doing this.

say $1
   while /
      G s*+ ( (?&WORD) | (?&BRACKETED) )

      (?(DEFINE)
         (?<WORD>      s* w+ )
         (?<BRACKETED> s* { (?&TEXT)? s* } )
         (?<TEXT>      (?: (?&WORD) | (?&BRACKETED) )+ )
      )
   /xg;

But instead of reinventing the wheel, why not use Text::Balanced.

这篇关于Perl 正则表达式:匹配嵌套括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆