正则表达式匹配嵌套的开始和结束标签 [英] Regex matching nested beginning and ending tags

查看:125
本文介绍了正则表达式匹配嵌套的开始和结束标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里是我想提取标签 {{if}}{{\if}} 之间包含的字符串,我的意思是第一个和最后一个(引擎会重新检查内部的):

Here are strings that I'd like to extract the contain between the tags {{if}} and {{\if}}, I mean the first and last one (inner ones will be rechecked by the engine) :

  • "before {{if^^p1^p2}} IN1; {{if^ ^p1}} {{iif}} IN3 {{/if}} IN1-1 {{/if}} after"
  • "before {{if^ ^p1}} IN1; {{if^ ^p1}} {{if^ ^p1}} IN3 {{/if}} {{/if}} IN1-1 {{/如果}}之后"
  • "before {{if^ ^p1}} IN1; {{if^ ^p1}} {{if^ ^p1}} IN3 {{/if}} {{/if}} IN1-1 {{if^ ^p1}} IN4 {{/if}} {{/if}} 之后"

正则表达式是:\{\{(if)\}\}(((?!\{\{\/?\1\}\})[\s\S])*(\{\{\1\}\}(?2)*\{\{\/\1\}\})*((?!\{\{\/?\1\}\})[\s\S])*)\{\{\/\1\}\}

EDIT 3 :我取消了支持 TAG 的义务而不结束一个.我为未来的用户重新格式化了问题,以了解下面的一些评论,请参阅帖子的第一个版本

更多,我让它同时适用于所有三个,给我三个匹配,这在 regex101 网站上不起作用.必须在比赛中支持换行.不过,我可以接受只有最后两个组合才能提供两个匹配项,因为我可以为 iif 更改单独的 if 标签.

More, I have it to works for all three at the same time giving me three matches, which is not working on the website regex101. Line breaks have to be supported within the match. Though, I could accept that only last two combined gives two matches because I could change the tag of alone if for iif.

我的另一个解决方案是不使用正则表达式,但如果可能的话,我想这样做.

My other solution is not using regular expressions, but I would like to do so if it's possible.

推荐答案

可以使用

~{{             # Opening tag start
  (\w+)         # (Group 1) Tag name
  \^            # Aux delimiter
  ([^^\{\}]?)   # (Group 2) Specific delimiter
  \^            # Aux delimiter
  ([^\{\}]+)    # (Group 3) Parameters
 }}             # Opening tag end
  (             # (Group 4)
   (?>          
     (?R)       # Repeat the whole pattern
     |          # or match all that is not the opening/closing tag
     [^{]*(?:\{(?!{/?\1[^\{\}]*}})[^{]*)*
   )*           # Zero or more times
  )
 {{/\1}}        # Closing tag
~ix

查看正则表达式演示

一般来说,该表达式基于递归和一个tempered greedy token.[^{]*(?:\{(?!{/?\1[^\{\}]*}})[^{]*)* 部分是一个展开的 (?s:(?!{{/?\1}}).)* 匹配任何不是 起始点的字符 (.) 的模式{{TAG}}{{/TAG}} 字符序列.

In general, the expression is based on recursion and a tempered greedy token. The [^{]*(?:\{(?!{/?\1[^\{\}]*}})[^{]*)* part is an unrolled (?s:(?!{{/?\1}}).)* pattern that matches any character (.) that is not the starting point for a {{TAG}} or {{/TAG}} character sequences.

您不需要此模式的 DOTALL 修饰符,因为模式中没有 ..

You do not need a DOTALL modifier for this pattern as there is no . in the pattern.

这是一个 PHP 演示:

$re = '~{{(\w+)\^([^^\{\}]?)\^([^\{\}]+)}}((?>(?R)|[^{]*(?:\{(?!{/?\1[^\{\}]*}})[^{]*)*)*){{/\1}}~i'; 
$str = "before {{if^^p1^p2}} IN1; {{if^ ^p1}} {{iif}} IN3 {{/if}} IN1-1 {{/if}} after\nbefore {{if^ ^p1}} IN1; {{if^ ^p1}} {{if^ ^p1}} IN3 {{/if}} {{/if}} IN1-1 {{/if}} after\nbefore {{if^ ^p1}} IN1; {{if^ ^p1}} {{if^ ^p1}} IN3 {{/if}} {{/if}} IN1-1 {{if^ ^p1}} IN4 {{/if}} {{/if}} after"; 
preg_match_all($re, $str, $matches);
print_r($matches);

这篇关于正则表达式匹配嵌套的开始和结束标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆