如何匹配开始和结束大括号内的文本、标签和指定的属性 [英] How to match text inside starting and closing curly brace, the tags and the specified attributes

查看:48
本文介绍了如何匹配开始和结束大括号内的文本、标签和指定的属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为我的 CMS 系统实现一个插件代码.类似于短代码的东西,但适用于许多场景.我想要一个管理员像这样编写他的代码的情况:

示例 1:

{COMMAND_NAME}包含 htmltags、symbols 和任何内容的文本字符串{/COMMAND_NAME}

示例 2

{COMMAND_NAME}

示例 3

{COMMAND_NAME{attriute1=value attribute2=value}}

示例 4

{COMMAND_NAME{attriute1=value attribute2=value}}任何字符串,包括文本、htmltags 和任何内容 {/COMMAND_NAME}

Regex 可以匹配上面的字符串.获取 COMMAND_NAME,获取中间的文本并从单个正则表达式模式获取结束的 {/COMMAND_NAME}.

在正则表达式中,我想捕获 COMMAND_NAME,如果提供了属性,如果 {COMMAND_NAME} 有结束 {/COMMAND_NAME} 和结束的 {/COMMAND_NAME}(如果提供).

看看我到目前为止做了什么,得到一些不完整的结果.

$regex = #\{(RAW|ACCESS|DWNLINK|MODL)[\{]{0,1}([\w\W\s]*?)\}{0}\}([\w\s]+)([\{/RAW|ACCESS|DWNLINK|MODL]*)\}#i$strings = '<div class="blog-list-item blog"><header class="entry-title"><h1>欢迎访问我们的网站</h1></header><article id="entry-72" class="entry post-72 page et-bg-layout-dark et-white-bg"><div class="jumbotron row"><div class="col-md-8"><ul><li>你的脖子上有一项待处理的工作?&hellip;</li><li>贵公司是否需要改造网站?&hellip;</li><li>还是有竞争力的 Web 应用程序??&hellip;</li><li>你需要一个定制的插件,还是一个调整?&hellip;</li><li>也许您想要一个个人网站?&hellip;</li><li>或者您的新项目的图形?&hellip;</li><div class="bg-primary well"><h4 class="text-center text-white shadow">跟踪您的项目,我们将它完美地工作...</h4>

<div class="pull-right col-md-4"><h4 class="bg-primary text-white well">我们提供的其他服务</h4>{访问{type=500}}<ul><li>现有网站或新网站的 SEO 工作</li><li>批量短信</li><li>电子货币兑换</li><li>Facebook广告</li><li>谷歌广告</li>{/访问}

{RAW{say=email,access=500}} {RAW} <a class="btn button large tall green" href="client-area">立即放置新工作,因为我们会以最快的速度 <em>合理的时间{/RAW}

';并执行 php var_dump,得到以下结果:数组(5){[0]=>数组(1){[0]=>字符串(224){访问{type=500}}<ul><li>现有网站或新网站的 SEO 工作</li><li>批量短信</li><li>电子货币兑换</li><li>Facebook广告</li><li>谷歌广告</li>{/访问}

{RAW{say=email,access=500}} {RAW}"}[1]=>数组(1){[0]=>字符串(6)访问"}[2]=>数组(1){[0]=>字符串(209)类型= 500}}<ul><li>现有网站或新网站的 SEO 工作</li><li>批量短信</li><li>电子货币兑换</li><li>Facebook广告</li><li>谷歌广告</li>{/访问}

{RAW{say=email,access=500}"}[3]=>数组(1){[0]=>字符串(1)"}[4]=>数组(1){[0]=>字符串(4){RAW"}}

这实际上不是我需要检索的.再一次,我想捕获 COMMAND_NAME,只有提供的属性,如果 {COMMAND_NAME} 有一个结束的 {/COMMAND_NAME} 和结束的 {/COMMAND_NAME}(如果提供).这意味着命令可以是内联的 {COMMAND_NAME},或者不是 {COMMAND_NAME} 某些字符串 {/COMMAND_NAME},具有属性 {COMMAND_NAME{attr1=value attr2=value2}} 与否.

解决方案

此正则表达式将按您指定的方式工作:

$regex = '~#开头标签\{(RAW|访问|DWNLINK|MODL|\w+)#可选属性(?>\{ ([^}]*) })?}#可选文本和结束标记(?:(#text:= 除{"以外的任何字符,或{"后不跟/commandname[^{]*+(?>\{(?!/?\1[{}])[^{]*)*+)#结束标签( \{/\1} ))?〜ix';

regex101 演示

<小时>

与您所拥有的相比:

首先,我使用了 /x 修饰符(在最后),它忽略了空格和 #comments.

在开始标记中,我使用了您的选项,但您也可以使用 \w+ 来匹配 任何 命令名称:

\{(RAW|ACCESS|DWNLINK|MODL|\w+)

对于可选属性,您有 [\{]{0,1}([\w\W\s]*?)\}{0},这是有效的尝试每个部分都是可选的.相反,我使用的是 (?> group )?(参见 非捕获组原子组)使整个子模式可选(使用 ? 量词).

 (?>\{ ([^}]*) })?

相同的逻辑应用于文本和结束标记,使其成为可选的.

您使用 [\w\s]+ 匹配文本,它匹配单词字符和空格,但无法匹配标点符号和其他字符.我可以使用 .*? 并且它也能正常工作.但是,我使用了以下结构,它匹配相同但性能更好:

 ( #text:= 除{"之外的任何字符,或{"后不跟/commandname[^{]*+(?>\{(?!/?\1[{}])[^{]*)*?)

最后,我使用 \1 匹配结束标记,这是对第 1 组(开始标记名称)中匹配的文本的反向引用:

\{/\1}

<小时>

假设:

I am implementing a plugin code for my CMS system. Something like a shortcode but will be applicable in many scenarios. I want a case where an admin writes his code like this:

Example 1:

{COMMAND_NAME}Strings of texts that conatains htmltags,symbols,just anything{/COMMAND_NAME}

Example 2

{COMMAND_NAME}

Example 3

{COMMAND_NAME{attriute1=value attribute2=value}}

Example 4

{COMMAND_NAME{attriute1=value attribute2=value}}Strings of anything including texts, htmltags and anything at all {/COMMAND_NAME}

Regex can match the the above string. Get the COMMAND_NAME, get the text in between and get the closing {/COMMAND_NAME} from a single regex pattern.

In the regex , I want to capture the COMMAND_NAME, the attributes if provided, the text in between if the {COMMAND_NAME} has a closing {/COMMAND_NAME} and the closing {/COMMAND_NAME} if provided.

See what I've done so far and go some incomplete result.

$regex = #\{(RAW|ACCESS|DWNLINK|MODL)[\{]{0,1}([\w\W\s]*?)\}{0}\}([\w\s]+)([\{/RAW|ACCESS|DWNLINK|MODL]*)\}#i

$strings = '<div class="blog-list-item blog"><header class="entry-title">
        <h1>Welcome to our website</h1>
    </header><article id="entry-72" class="entry post-72 page et-bg-layout-dark et-white-bg"><div class="jumbotron row">
<div class="col-md-8">
<ul>
<li>You have a pending job on your neck?&hellip;</li>
<li>Do your company need a website makeover ?&hellip;</li>
<li>Or a competitive web application ? ?&hellip;</li>
<li>Do you need a customized plugin, or a tweak ?&hellip;</li>
<li>Maybe you want a personal website ?&hellip;</li>
<li>Or a graphic for your new project ?&hellip;</li>
</ul>
<div class="bg-primary well">
<h4 class="text-center text-white shadow">Track your project as we work it         to perfection...</h4>
</div>
</div>
<div class="pull-right col-md-4">
<h4 class="bg-primary text-white well">Other services we offer</h4>
{ACCESS{type=500}}
<ul>
<li>SEO work for an existing website or new</li>
<li>Bulk SMS</li>
<li>E-currency exchange</li>
<li>Facebook AD</li>
<li>Google AD</li>
</ul>
{/ACCESS}</div>
{RAW{say=email,access=500}} {RAW} <a class="btn button large tall green"     href="client-area">Place new Job now as we deliver at the quickest   <em>reasonable time</em></a>{/RAW}</div></article></div>';

And doing a php var_dump, gives the following result:
array(5) {
  [0]=>
  array(1) {
    [0]=>
    string(224) "{ACCESS{type=500}}
<ul>
<li>SEO work for an existing website or new</li>
<li>Bulk SMS</li>
<li>E-currency exchange</li>
<li>Facebook AD</li>
<li>Google AD</li>
</ul>
{/ACCESS}</div>
{RAW{say=email,access=500}} {RAW}"
  }
  [1]=>
  array(1) {
    [0]=>
    string(6) "ACCESS"
  }
  [2]=>
  array(1) {
    [0]=>
    string(209) "type=500}}
<ul>
<li>SEO work for an existing website or new</li>
<li>Bulk SMS</li>
<li>E-currency exchange</li>
<li>Facebook AD</li>
<li>Google AD</li>
</ul>
{/ACCESS}</div>
{RAW{say=email,access=500}"
  }
  [3]=>
  array(1) {
    [0]=>
    string(1) " "
  }
  [4]=>
  array(1) {
    [0]=>
    string(4) "{RAW"
  }
}

Which is actually not what i needed to retrieve. Once again, I want to capture the COMMAND_NAME, the attributes only if provided, the text in between if the {COMMAND_NAME} has a closing {/COMMAND_NAME} and the closing {/COMMAND_NAME} if provided. That means the command can be inline {COMMAND_NAME}, or not {COMMAND_NAME} some strings {/COMMAND_NAME}, has an attribute {COMMAND_NAME{attr1=value attr2=value2}} or not.

解决方案

This regex will work as you specified:

$regex = '~

#opening tag
\{(RAW|ACCESS|DWNLINK|MODL|\w+)
 #optional attributes
 (?>
     \{   ([^}]*)   }
 )?

}


#optional text and closing tag
(?:
    (   #text:= any char except "{", or a "{" not followed by /commandname
        [^{]*+
        (?>\{(?!/?\1[{}])[^{]*)*+
    )

    #closing tag
    (   \{/\1}   )
)?

~ix';

regex101 demo


Compared to what you had:

First of all, I used the /x modifier (at the end), which ignores whitespace and #comments.

In the opening tag, I used your options, but you may as well use \w+ to match any command name:

\{(RAW|ACCESS|DWNLINK|MODL|\w+)

For the optional attributes, you had [\{]{0,1}([\w\W\s]*?)\}{0}, which was avalid attempt to make every part optional. Instead, I'm using a (?> group )? (See non-capturing groups and atomic groups) to make the whole subpattern optional (with the ? quantifier).

 (?>
     \{   ([^}]*)   }
 )?

The same logic is applied to the text and closing tag, to make it optional.

You were using [\w\s]+ to match the text, which matches word characters and whitespace, but fails to match punctuation and other characters. I could have used .*? and it would work just as fine. However, I used the following construct, which matches the same, but performs better:

    (   #text:= any char except "{", or a "{" not followed by /commandname
        [^{]*+
        (?>\{(?!/?\1[{}])[^{]*)*?
    )

And finally, I'm matching the closing tag using \1, which is a backreference to the text matched in group 1 (the opening tag name):

\{/\1}


Assumptions:

这篇关于如何匹配开始和结束大括号内的文本、标签和指定的属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
PHP最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆