正则表达式匹配除< p>之外的所有HTML标记和< / p> [英] Regex to match all HTML tags except <p> and </p>
问题描述
我需要在Perl中使用正则表达式来匹配并删除所有标签。我有以下内容:
I need to match and remove all tags using a regular expression in Perl. I have the following:
<\\??(?!p).+?>
但这仍然与收盘< / p>
标签。任何有关如何与结束标签匹配的提示?
But this still matches with the closing </p>
tag. Any hint on how to match with the closing tag as well?
请注意,这是在xhtml上执行的。
Note, this is being performed on xhtml.
I came up with this:
△p(= GT; |?\s * GT))\ /?.*?& GT;
x /
< #匹配开角尖括号
(?!#负向前瞻(不匹配并且不消耗)
\ /?#0或1 /
p#p
(?=#Positive lookahead(匹配而不是消费)
>#> - 无属性
|#或
\s#whitespace
。*#高达
> ;#关闭尖括号 - 带有属性
)#关闭正面lookahead
)#关闭负面lookahead
#如果我们有这么远,那么我们不匹配
#ap标签或者关闭p标签
#带或不带属性
\ /? #可选关闭标记符号(/)
。*? #和任何最多
> #第一个结束标记
/
<(?!\/?p(?=>|\s.*>))\/?.*?>
x/
< # Match open angle bracket
(?! # Negative lookahead (Not matching and not consuming)
\/? # 0 or 1 /
p # p
(?= # Positive lookahead (Matching and not consuming)
> # > - No attributes
| # or
\s # whitespace
.* # anything up to
> # close angle brackets - with attributes
) # close positive lookahead
) # close negative lookahead
# if we have got this far then we don't match
# a p tag or closing p tag
# with or without attributes
\/? # optional close tag symbol (/)
.*? # and anything up to
> # first closing tag
/
这将处理带有或不带有属性的p标签,关闭p标签,但会匹配pre和类似标签,有或没有属性。
This will now deal with p tags with or without attributes and the closing p tags, but will match pre and similar tags, with or without attributes.
它不会剥离属性,但我的源数据不会将它们放入。我稍后可能会改变这一点,但现在这已经足够了。
It doesn't strip out attributes, but my source data does not put them in. I may change this later to do this, but this will suffice for now.
这篇关于正则表达式匹配除< p>之外的所有HTML标记和< / p>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!