使用JavaScript RegEx从html标记中移除不必要的属性 [英] Remove unnecessary attributes from html tag using JavaScript RegEx

查看:94
本文介绍了使用JavaScript RegEx从html标记中移除不必要的属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新手到正则表达式,尝试过滤HTML标记,只保留它们的值(src / href / style)属性并删除不必要的属性。虽然使用googling,我发现正则表达式只保留src属性,因此我的修改后的表达式如下所示:

I'm newbie to regular expressions, trying to filter the HTML tags keeping only required (src / href / style) attribute with their values and remove unnecessary attributes. While googling I found a regular expression to keep only "src" attribute, hence my modified expression is as follows:

<([a-z][a-z0-9]*)(?:[^>]*(\s(src|href|style)=['\"][^'\"]*['\"]))?[^>]*?(\/?)>

它的工作正常,但唯一的问题是,如果一个标签包含多个必需的属性,那么它只保留最后匹配的单个属性,并丢弃其余的。

Its working fine but the only problem is, if one tag contains more than one required attribute then it keeps only the last matched single attribute and discards the rest.

我是尝试清除以下文本:

I'm trying to clean following text

<title>Hello World</title>
<div fadeout"="" style="margin:0px;" class="xyz">
    <img src="abc.jpg" alt="" />
    <p style="margin-bottom:10px;">
        The event is celebrating its 50th anniversary K&ouml;&nbsp;
        <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
    </p>
    <p style="padding:0px;"></p>
    <p style="color:black;">
        <strong>A festival for art lovers</strong>
    </p>
</div>

at https://regex101.com/#javascript ,使用上述表达式并使用< $ 1 $ 2 $ 4> 作为替换字符串并获取以下输出:

at https://regex101.com/#javascript using aforementioned expression with <$1$2$4> as substitution string and getting following output:

<title>Hello World</title>
<div style="margin:0px;">
    <img src="abc.jpg"/>
    <p style="margin-bottom:10px;">
        The event is celebrating its 50th anniversary K&ouml;&nbsp;
        <a href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
    </p>
    <p style="padding:0px;"></p>
    <p style="color:black;">
        <strong>A festival for art lovers</strong>
    </p>
</div>

问题是从样式标记中丢弃了style属性。
我试图复制(\s(src | href | style)= ['\] [^'\] * ['\])块使用*运算符,{3}选择器和更多,但徒劳无功
任何建议???

Problem is "style" attribute is discarded from anchor tag. I have tried to replicate the (\s(src|href|style)=['\"][^'\"]*['\"]) block using * operator, {3} selector and much more but in vain. Any suggestions???

推荐答案

@AhmadAhsan这里是演示使用DOM操作来解决您的问题: https://jsfiddle.net/pu1hsdgn /

@AhmadAhsan here is demo to fix your issue using DOM manipulation: https://jsfiddle.net/pu1hsdgn/

   <script src="https://code.jquery.com/jquery-1.9.1.js"></script>
    <script>
        var whitelist = ["src", "href", "style"];
        $( document ).ready(function() {
            function foo(contents) {
            var temp = document.createElement('div');
            var html = $.parseHTML(contents);
            temp = $(temp).html(contents);

            $(temp).find('*').each(function (j) {
                var attributes = this.attributes;
                var i = attributes.length;
                while( i-- ) {
                    var attr = attributes[i];
                    if( $.inArray(attr.name,whitelist) == -1 )
                        this.removeAttributeNode(attr);
                }
            });
            return $(temp).html();
        }
        var raw = '<title>Hello World</title><div style="margin:0px;" fadeout"="" class="xyz"><img src="abc.jpg" alt="" /><p style="margin-bottom:10px;">The event is celebrating its 50th anniversary K&ouml;&nbsp;<a href="http://www.germany.travel/" style="margin:0px;">exhibition grounds in Cologne</a>.</p><p style="padding:0px;"></p><p style="color:black;"><strong>A festival for art lovers</strong></p></div>'
        alert(foo(raw));
    });
    </script>

这篇关于使用JavaScript RegEx从html标记中移除不必要的属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆