为什么JSoup删除元素ID? [英] Why does JSoup remove element IDs?

查看:220
本文介绍了为什么JSoup删除元素ID?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用JSoup来清理一些不受信任的HTML。我发现如果我打电话

I'm using JSoup to sanitize some untrusted HTML. I discovered that if I call

String html = "<div id='foo'><script type='text/javascript'>alert('hello');</script></div>";
String cleanedHtml = Jsoup.clean(html, Whitelist.relaxed());

此时 cleaningHtml

<div><div>

所以< script> 标签有正确地删除了,但神秘的是,< div> id 属性也是如此。有没有什么理由可以删除它或者它是一个错误?

So the <script> tag has correctly been removed, but mysteriously, so has the id attribute of the <div>. Is there any good reason why this should be removed or is it a bug?

推荐答案

默认情况下 id 属性已删除;将其添加为允许属性:

By default the id attribute is removed; add it as an allowable attribute:

Whitelist whitelist = Whitelist.relaxed().addAttributes("div", "id");
System.out.println(Jsoup.clean(html, whitelist));

=> <div id="foo"></div>

这是一个错误吗?不是AFAIC;它在源头。但IMO存在文档错误。

Is it a bug? Not AFAIC; it's in the source. IMO there are documentation bugs, though.

是否有任何理由为什么要删除它?不确定那个,但这样的属性不是结构性的:删除它不会改变DOM。这就是白名单–它们明确允许,并且必须根据您的确切需求进行策划。

Is there "any good reason" why this should be removed? Not sure about that one, but attributes like this aren't structural: removing it doesn't alter the DOM. That's the thing about whitelists–they explicitly allow, and must be curated to match your precise needs.

这篇关于为什么JSoup删除元素ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆