删除嵌套的引号 [英] Remove nested quotes
本文介绍了删除嵌套的引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这段文字,我试图删除所有内部引号,而只保留一个引号级别.引号内的文本包含任何字符,甚至换行符等. 是否可以使用正则表达式或者我必须编写一个小解析器?
I have this text and I'm trying to remove all the inner quotes, just keeping one quoting level. The text inside a quote contains any characters, even line feeds, etc. Is this possible using a regex or I have to write a little parser?
[quote=foo]I really like the movie. [quote=bar]World
War Z[/quote] It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]
这是我想要的文字:
[quote=foo]I really like the movie. It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]
这是我在PHP中使用的正则表达式:
This is the regex I'm using in PHP:
%\[quote\s*(=[a-zA-Z0-9\-_]*)?\](.*)\[/quote\]%si
我也尝试了这种变体,但它与.
或,
不匹配,我无法弄清楚在引号中还能找到什么:
I tried also this variant, but it doesn't match .
or ,
and I can't figure what else I can find inside a quote:
%\[quote\s*(=[a-zA-Z0-9\-_]*)?\]([\w\s]+)\[/quote\]%i
问题位于此处:
(.*)
推荐答案
您可以使用:
$result = preg_replace('~\G(?!\A)(?>(\[quote\b[^]]*](?>[^[]+|\[(?!/?quote)|(?1))*\[/quote])|(?<!\[)(?>[^[]+|\[(?!/?quote))+\K)|\[quote\b[^]]*]\K~', '', $text);
详细信息:
\G(?!\A) # contiguous to a precedent match
(?> ## content inside "quote" tags at level 0
( ## nested "quote" tags (group 1)
\[quote\b[^]]*]
(?> ## content inside "quote" tags at any level
[^[]+
| # OR
\[(?!/?quote)
| # OR
(?1) # repeat the capture group 1 (recursive)
)*
\[/quote]
)
|
(?<!\[) # not preceded by an opening square bracket
(?> ## content that is not a quote tag
[^[]+ # all that is not a [
| # OR
\[(?!/?quote) # a [ not followed by "quote" or "/quote"
)+\K # repeat 1 or more and reset the match
)
| # OR
\[quote\b[^]]*]\K # "quote" tag at level 0
这篇关于删除嵌套的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文