正则表达式匹配除 <p> 之外的所有 HTML 标签和</p> [英] Regex to match all HTML tags except <p> and </p>

查看:47
本文介绍了正则表达式匹配除 <p> 之外的所有 HTML 标签和</p>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 Perl 中使用正则表达式匹配和删除所有标签.我有以下几点:

I need to match and remove all tags using a regular expression in Perl. I have the following:

<\??(?!p).+?>

但这仍然与结束 </p> 标签匹配.关于如何与结束标记匹配的任何提示?

But this still matches with the closing </p> tag. Any hint on how to match with the closing tag as well?

注意,这是在 xhtml 上执行的.

Note, this is being performed on xhtml.

推荐答案

我想出了这个:

<(?!/?p(?=>|s.*>))/?.*?>

x/
<           # Match open angle bracket
(?!         # Negative lookahead (Not matching and not consuming)
    /?     # 0 or 1 /
    p           # p
    (?=     # Positive lookahead (Matching and not consuming)
    >       # > - No attributes
        |       # or
    s      # whitespace
    .*      # anything up to 
    >       # close angle brackets - with attributes
    )           # close positive lookahead
)           # close negative lookahead
            # if we have got this far then we don't match
            # a p tag or closing p tag
            # with or without attributes
/?         # optional close tag symbol (/)
.*?         # and anything up to
>           # first closing tag
/

这将处理带有或不带有属性的 p 标签以及结束 p 标签,但会匹配带有或不带有属性的 pre 和类似标签.

This will now deal with p tags with or without attributes and the closing p tags, but will match pre and similar tags, with or without attributes.

它不会去除属性,但我的源数据没有将它们放入.我可能稍后会更改它以执行此操作,但现在就足够了.

It doesn't strip out attributes, but my source data does not put them in. I may change this later to do this, but this will suffice for now.

这篇关于正则表达式匹配除 &lt;p&gt; 之外的所有 HTML 标签和&lt;/p&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆