递归正则表达式来处理由{|括起来的嵌套字符串和|} [英] recursive regular expression to process nested strings enclosed by {| and |}
问题描述
在一个项目中,我有一个带有类似模式的文本:
In a project I have a text with patterns like that:
{|文字{|文字|}文字|}
更多文字
{| text {| text |} text |}
more text
我想用括号括起来的第一部分.为此,我递归使用preg_match.下面的代码已经可以正常工作了:
I want to get the first part with brackets. For this I use preg_match recursively. The following code works fine already:
preg_match('/\{((?>[^\{\}]+)|(?R))*\}/x',$text,$matches);
但是,如果我添加符号"|",则会得到一个空结果,并且我不知道为什么:
But if I add the symbol "|", I got an empty result and I don't know why:
preg_match('/\{\|((?>[^\{\}]+)|(?R))*\|\}/x',$text,$matches);
我无法使用第一个解决方案,因为在文本中也可能存在类似{text}的内容.有人可以告诉我我在这里做错了什么吗?谢谢
I can't use the first solution because in the text something like { text } can also exist. Can somebody tell me what I do wrong here? Thx
推荐答案
尝试一下:
'/(?s)\{\|(?:(?:(?!\{\||\|\}).)++|(?R))*\|\}/'
在原始正则表达式中,您使用字符类[^{}]
来匹配除定界符之外的所有内容.当分隔符仅是一个字符,而您的分隔符是两个字符时,这很好.要不匹配多字符序列,您需要执行以下操作:
In your original regex you use the character class [^{}]
to match anything except a delimiter. That's fine when the delimiters are only one character, but yours are two characters. To not-match a multi-character sequence you need something this:
(?:(?!\{\||\|\}).)++
点匹配任何字符(包括换行符,感谢(?s)
),但仅在先行确定它不属于{|
或|}
序列的一部分之后.我还删除了原子组((?>...)
),并用所有格修饰符(++
)替换了它,以减少混乱.但是您绝对应该在正则表达式的该部分中使用另一个,以防止灾难性回溯.
The dot matches any character (including newlines, thank to the (?s)
), but only after the lookahead has determined that it's not part of a {|
or |}
sequence. I also dropped your atomic group ((?>...)
) and replaced it with a possessive quantifier (++
) to reduce clutter. But you should definitely use one or the other in that part of the regex to prevent catastrophic backtracking.
这篇关于递归正则表达式来处理由{|括起来的嵌套字符串和|}的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!