正则表达式匹配 [英] Regular expression matching

查看:95
本文介绍了正则表达式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个正则表达式来匹配两者之间的任何内容

I want to write a regular expression which matches anything between

()
(())
(()())
((()))
()()()

推荐答案

所有这些声称您不能使用模式来匹配具有平衡嵌套括号的字符串的答案都是完全错误的.假设现代编程语言匹配的模式仅限于病态教科书意义上的常规语言"是不切实际的.一旦您允许反向引用,它们就不会.这使得现实世界的模式比教科书版本更匹配,使其更加实用.

All these answers claiming you can't use patterns to match a string with balanced nested parens are quite wrong. It's not practical to pretend that the patterns matched by modern programming languages are restricted to "regular languages" in the pathological textbook sense. As soon as you permit backreferences, they're not. This allows real-world patterns to match much more than the textbook versions, making them far more practical.

匹配平衡括号的最简单模式是\((?:[^()]*+|(?0))*\).但是你不应该永远不要写,因为它太紧凑了,不容易阅读.你应该总是/x 模式编写它以允许空格和注释.所以写成这样:

The simplest pattern for matching balanced parens is \((?:[^()]*+|(?0))*\). But you should never write that, because it is too compact to be easily read. You should always write it with /x mode to allow for whitespace and comments. So write it like this:

m{
  \(              # literal open paren
     (?:          # begin alternation group
         [^()]*+  #  match nonparens possessively
       |          # or else
         (?0)     #  recursively match entire pattern
     )*           # repeat alternation group
  \)              # literal close paren
}x

对于抽象的命名,以及将它们的定义和顺序与它们的执行分离,还有很多话要说.这导致了这种事情:

There's also a lot to be said for naming your abstractions, and decoupling their definition and its ordering from their execution. That leads to this sort of thing:

my $nested_paren_rx = qr{

    (?&nested_parens)

    (?(DEFINE)

        (?<open>       \(       )
        (?<close>       \)      )
        (?<nonparens> [^()]     )

        (?<nested_parens>
            (?&open)
            (?:
                (?&nonparens) *+
              |
                (?&nested_parens)
            ) *
            (?&close)
        )

    )
}x;

第二种形式现在可以包含在更大的模式中.

The second form is now amenable to inclusion in larger patterns.

永远不要让任何人告诉您不能使用模式来匹配递归定义的内容.正如我刚刚演示的那样,你当然可以.

Don't ever let anybody tell you can't use a pattern to match something that's recursively defined. As I've just demonstrated, you most certainly can.

当你在做的时候,确保永远不要写线噪声模式.你没有必要,也不应该.禁止使用空格、注释、子例程或字母数字标识符的编程语言是不可维护的.所以在你的模式中使用所有这些东西.

While you're at it, make sure never to write line-noise patterns. You don't have to, and you shouldn't. No programming language can be maintainable that forbids white space, comments, subroutines, or alphanumeric identifiers. So use all those things in your patterns.

当然,确实有助于为此类工作选择正确的语言.☺

Of course, it does help to pick the right language for this kind of work. ☺

这篇关于正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆