递归正则表达式来处理由{|括起来的嵌套字符串和|} [英] recursive regular expression to process nested strings enclosed by {| and |}

查看:48
本文介绍了递归正则表达式来处理由{|括起来的嵌套字符串和|}的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个项目中,我有一个带有类似模式的文本:

In a project I have a text with patterns like that:

{|文字{|文字|}文字|}
更多文字

{| text {| text |} text |}
more text

我想用括号括起来的第一部分.为此,我递归使用preg_match.下面的代码已经可以正常工作了:

I want to get the first part with brackets. For this I use preg_match recursively. The following code works fine already:

preg_match('/\{((?>[^\{\}]+)|(?R))*\}/x',$text,$matches);

但是,如果我添加符号"|",则会得到一个空结果,并且我不知道为什么:

But if I add the symbol "|", I got an empty result and I don't know why:

preg_match('/\{\|((?>[^\{\}]+)|(?R))*\|\}/x',$text,$matches);

我无法使用第一个解决方案,因为在文本中也可能存在类似{text}的内容.有人可以告诉我我在这里做错了什么吗?谢谢

I can't use the first solution because in the text something like { text } can also exist. Can somebody tell me what I do wrong here? Thx

推荐答案

尝试一下:

'/(?s)\{\|(?:(?:(?!\{\||\|\}).)++|(?R))*\|\}/'

在原始正则表达式中,您使用字符类[^{}]来匹配除定界符之外的所有内容.当分隔符仅是一个字符,而您的分隔符是两个字符时,这很好.要不匹配多字符序列,您需要执行以下操作:

In your original regex you use the character class [^{}] to match anything except a delimiter. That's fine when the delimiters are only one character, but yours are two characters. To not-match a multi-character sequence you need something this:

(?:(?!\{\||\|\}).)++

点匹配任何字符(包括换行符,感谢(?s)),但仅在先行确定它不属于{||}序列的一部分之后.我还删除了原子组((?>...)),并用所有格修饰符(++)替换了它,以减少混乱.但是您绝对应该在正则表达式的该部分中使用另一个,以防止灾难性回溯.

The dot matches any character (including newlines, thank to the (?s)), but only after the lookahead has determined that it's not part of a {| or |} sequence. I also dropped your atomic group ((?>...)) and replaced it with a possessive quantifier (++) to reduce clutter. But you should definitely use one or the other in that part of the regex to prevent catastrophic backtracking.

这篇关于递归正则表达式来处理由{|括起来的嵌套字符串和|}的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆