如何使Jsoup白名单接受某些属性内容 [英] How to make a Jsoup whitelist to accept certain attribute content

查看:1184
本文介绍了如何使Jsoup白名单接受某些属性内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Jsoup和轻松的白名单。这似乎很完美,但我想保留嵌入式图像标签,如< img alt =src =data:; base64

I'm using Jsoup with relaxed whitelist. It seems perfect but I would like to keep the embedded images tags like <img alt="" src="data:;base64.

有没有办法修改白名单以接受那些img?

Is there a way to modify the whitelist to accept also those img?

编辑

如果我使用 Whitelist.relaxed()。addProtocols(img,src,data)那么那些img标签没有被删除。但它接受data:之后的任何内容,如果src内容以data:; base64开头,我想保留它们。是否可以使用jsoup?

If I use Whitelist.relaxed().addProtocols("img","src","data") then those img tags are not removed. But it accepts anything after "data:" and I would like just to keep them if src content starts with "data:;base64". Is it possible with jsoup?

推荐答案

您可以扩展白名单并覆盖 isSafeAttribute 执行自定义检查。因为没有办法扩展Whitelist.relaxed()直接,您将需要复制一些代码来设置相同的列表:

You can extend Whitelist and override isSafeAttribute to perform custom checks. As there's no way to extend Whitelist.relaxed() directly, you'll have to copy some code to set up the same list:

public class RelaxedPlusDataBase64Images extends Whitelist {
    public RelaxedPlusDataBase64Images() {
        //copied from Whitelist.relaxed()
        addTags("a", "b", "blockquote", "br", "caption", "cite", "code", "col",
                "colgroup", "dd", "div", "dl", "dt", "em", "h1", "h2", "h3", "h4", "h5", "h6",
                "i", "img", "li", "ol", "p", "pre", "q", "small", "strike", "strong",
                "sub", "sup", "table", "tbody", "td", "tfoot", "th", "thead", "tr", "u",
                "ul");
        addAttributes("a", "href", "title");
        addAttributes("blockquote", "cite");
        addAttributes("col", "span", "width");
        addAttributes("colgroup", "span", "width");
        addAttributes("img", "align", "alt", "height", "src", "title", "width");
        addAttributes("ol", "start", "type");
        addAttributes("q", "cite");
        addAttributes("table", "summary", "width");
        addAttributes("td", "abbr", "axis", "colspan", "rowspan", "width");
        addAttributes("th", "abbr", "axis", "colspan", "rowspan", "scope", "width");
        addAttributes("ul", "type");
        addProtocols("a", "href", "ftp", "http", "https", "mailto");
        addProtocols("blockquote", "cite", "http", "https");
        addProtocols("cite", "cite", "http", "https");
        addProtocols("img", "src", "http", "https");
        addProtocols("q", "cite", "http", "https");
    }

    @Override
    protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
        return ("img".equals(tagName)
                && "src".equals(attr.getKey())
                && attr.getValue().startsWith("data:;base64")) ||
            super.isSafeAttribute(tagName, el, attr);
    }
}

由于您还没有提供代码,因此您需要使用解析或你正在消毒的HTML,我还没有测试过。

As you haven't provided the code you're using to parse or the HTML you're sanitizing, I haven't tested this.

这篇关于如何使Jsoup白名单接受某些属性内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆