正则表达式匹配除 <p> 之外的所有 HTML 标签和</p> [英] Regex to match all HTML tags except <p> and </p>
问题描述
我需要在 Perl 中使用正则表达式匹配和删除所有标签.我有以下几点:
I need to match and remove all tags using a regular expression in Perl. I have the following:
<\??(?!p).+?>
但这仍然与结束 </p>
标签匹配.关于如何与结束标记匹配的任何提示?
But this still matches with the closing </p>
tag. Any hint on how to match with the closing tag as well?
注意,这是在 xhtml 上执行的.
Note, this is being performed on xhtml.
推荐答案
我想出了这个:
<(?!/?p(?=>|s.*>))/?.*?>
x/
< # Match open angle bracket
(?! # Negative lookahead (Not matching and not consuming)
/? # 0 or 1 /
p # p
(?= # Positive lookahead (Matching and not consuming)
> # > - No attributes
| # or
s # whitespace
.* # anything up to
> # close angle brackets - with attributes
) # close positive lookahead
) # close negative lookahead
# if we have got this far then we don't match
# a p tag or closing p tag
# with or without attributes
/? # optional close tag symbol (/)
.*? # and anything up to
> # first closing tag
/
这将处理带有或不带有属性的 p 标签以及结束 p 标签,但会匹配带有或不带有属性的 pre 和类似标签.
This will now deal with p tags with or without attributes and the closing p tags, but will match pre and similar tags, with or without attributes.
它不会去除属性,但我的源数据没有将它们放入.我可能稍后会更改它以执行此操作,但现在就足够了.
It doesn't strip out attributes, but my source data does not put them in. I may change this later to do this, but this will suffice for now.
这篇关于正则表达式匹配除 <p> 之外的所有 HTML 标签和</p>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!