删除嵌套的引号 [英] Remove nested quotes

查看:108
本文介绍了删除嵌套的引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这段文字,我试图删除所有内部引号,而只保留一个引号级别.引号内的文本包含任何字符,甚至换行符等. 是否可以使用正则表达式或者我必须编写一个小解析器?

I have this text and I'm trying to remove all the inner quotes, just keeping one quoting level. The text inside a quote contains any characters, even line feeds, etc. Is this possible using a regex or I have to write a little parser?

[quote=foo]I really like the movie. [quote=bar]World 

War Z[/quote] It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]

这是我想要的文字:

[quote=foo]I really like the movie.  It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]

这是我在PHP中使用的正则表达式:

This is the regex I'm using in PHP:

%\[quote\s*(=[a-zA-Z0-9\-_]*)?\](.*)\[/quote\]%si

我也尝试了这种变体,但它与.,不匹配,我无法弄清楚在引号中还能找到什么:

I tried also this variant, but it doesn't match . or , and I can't figure what else I can find inside a quote:

%\[quote\s*(=[a-zA-Z0-9\-_]*)?\]([\w\s]+)\[/quote\]%i

问题位于此处:

(.*)

推荐答案

您可以使用:

$result = preg_replace('~\G(?!\A)(?>(\[quote\b[^]]*](?>[^[]+|\[(?!/?quote)|(?1))*\[/quote])|(?<!\[)(?>[^[]+|\[(?!/?quote))+\K)|\[quote\b[^]]*]\K~', '', $text);

详细信息:

\G(?!\A)              # contiguous to a precedent match
(?>                   ## content inside "quote" tags at level 0
  (                    ## nested "quote" tags (group 1)
    \[quote\b[^]]*]
    (?>                ## content inside "quote" tags at any level
      [^[]+
     |                  # OR
      \[(?!/?quote)
     |                  # OR
      (?1)              # repeat the capture group 1 (recursive)
    )*
    \[/quote]
  )
 |
  (?<!\[)           # not preceded by an opening square bracket
  (?>              ## content that is not a quote tag
    [^[]+           # all that is not a [
   |                # OR
    \[(?!/?quote)   # a [ not followed by "quote" or "/quote"
  )+\K              # repeat 1 or more and reset the match
)
|                   # OR
\[quote\b[^]]*]\K   # "quote" tag at level 0 

这篇关于删除嵌套的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆