嵌套PCRE正则表达式问题 [英] Nested PCRE Regex Issue

查看:89
本文介绍了嵌套PCRE正则表达式问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个自定义模板引擎.

I have a custom template engine.

它抓住了这个:

@function(argument1 argument2 ...)
@get(param:name)
@get(param:@get(sub:name))

这:

@function(argument1 argument2 ...)

    Some stuff @with(nested:tag)

    @foreach(arguments as value)
        More stuff : @get(value)
    @/foreach

    @function(other:args)
        Same function name (nested)
    @/function

@/function

使用这种模式(PCRE/PHP):

With this pattern (PCRE / PHP) :

#

@ ([\w]+) \(

( (?: [^@\)] | (?R) )+ )

\)

(?:
    ( (?> (?-2) ) )

    @/\\1
)?

#xms

此正则表达式捕获几乎所有结果.但是,当我有更多的嵌套(或没有)标签时,它什么也收不到. 例如,当我做2个嵌套的@foreach(var:name) ... @/foreach时,根据标签内容spaces,正则表达式将失败.

This regex catch almost all results. But when i have more nested (or not) tags, then it catch nothing. For example, when i do 2 nested @foreach(var:name) ... @/foreach then the regex will fail depending of the tag content spaces.

推荐答案

使用命名子模式有时会更清晰.我建议您使用此:

Using named subpatterns is sometimes more clear. I suggest you to use this:

~
@(?<com>\w+)                 # command name
\s*                          # possible white characters before args
(?: \( (?<args>[^)]*) \) )?+ # eventual parameters
(?:
    (?<content>(?:[^@]+|(?R))*+) # content (maybe empty)
    @/\g{com}                    # close the command
)?+                          # optional
~

如果需要允许在参数中使用命令,则可以将(?<args>[^)]*)替换为(?<args>(?:[^@)]+|(?=@)(?R))*+)

If you need to allow commands inside arguments, you can replace (?<args>[^)]*) with (?<args>(?:[^@)]+|(?=@)(?R))*+)

但是,当您尝试描述一种语言时,更好的方法是使用(?(DEFINE)...)语法先描述元素,然后再描述主要模式,例如:

But a better way when you are trying to describe a language is to use the (?(DEFINE)...) syntax to describe elements first, before the main pattern, example:

$pattern = <<<'EOD'
~
(?(DEFINE)
    (?<command_name> \w+ )
    (?<inline_command> @ \g<command_name> \s* \g<params>? )
    (?<multil_command> @ (\g<command_name>) \s* \g<params>? \g<content> @/ \g{-1} )
    (?<command> \g<multil_command> | \g<inline_command> )

    (?<other> [^@()]+ ) 
    (?<param> \g<other> | \g<command> )
    (?<params> \( \s* \g<param> (?: \s+ \g<param> )* \s* \) )

    (?<content> (?: \g<other> | \g<command> )* )
)
# main pattern
\g<command>
~x
EOD;

使用这种语法,如果要在底层提取元素,则只需将主模式更改为:@(?<com> \g<command_name> ) \s* (?<args>\g<params> )? (?: (?<con> \g<content> ) @/ \g{com} )?(注意:要获得其他级别,请将其放在前瞻性内)

With this kind of syntax, if you want to extract elements at the ground level, you only need to change the main pattern to: @(?<com> \g<command_name> ) \s* (?<args>\g<params> )? (?: (?<con> \g<content> ) @/ \g{com} )? (NB: To obtain other levels, put it inside a lookahead)

这篇关于嵌套PCRE正则表达式问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆