在客户端清理/重写HTML [英] Sanitize/Rewrite HTML on the Client Side
问题描述
我需要显示通过跨域请求加载的外部资源,并确保只显示 safe 内容。
可以使用Prototype的 String#stripScripts < a>删除脚本块。但是像 onclick
或 onerror
这样的处理程序仍然存在。
是否有任何库至少可以
- 条带块,
- 杀死DOM处理程序,
- 删除黑名单标签(例如:
embed
或object
)。
那么是否有任何与JavaScript相关的链接和例子? 解决方案
2016年更新:现在有一个软件包。
它有一个更简洁的API,在考虑可用API现代浏览器,并与Closure Compiler更好地交互。
无耻插件:请参阅 caja / plugin / h tml-sanitizer.js 为已经彻底审查的客户端html消毒器。
它是白名单,不是黑名单,但白名单可根据 CajaWhitelists
如果您想删除所有标签,请执行以下操作:
var tagBody ='(?:[^\'>] |[^] *| \'[^ \'] * \')*';
var tagOrComment = new RegExp(
'<(?:'
// Comment body。
+'! - (?:(?: - * [^ - >])* - + | - ?)'
//内容应该被忽略的特殊原始文本元素。 tagBody +'> [\\s\\S] *?< / script\\s *'
+'| style \\''+ tagBody +'> [\\s\\S] *?< / style\\\s *'
//常规名称
+'| /?[az]'
+ tagBody
+')>',
'gi');
函数removeTags(html){
var oldHtml;
do {
oldHtml = html;
html = html.replace(tagOrComment,'');
} while(html!== oldHtml);
返回html.replace(/< / g,'& lt;');
}
人们会告诉你可以创建一个元素,并将 innerHTML
,然后获取 innerText
或 textContent
,然后转义实体那。不要那样做。自< img src = bogus onerror = alert(1337)>
将运行 onerror
处理程序,即使节点从未连接到DOM。
I need to display external resources loaded via cross domain requests and make sure to only display "safe" content.
Could use Prototype's String#stripScripts to remove script blocks. But handlers such as onclick
or onerror
are still there.
Is there any library which can at least
- strip script blocks,
- kill DOM handlers,
- remove black listed tags (eg:
embed
orobject
).
So are any JavaScript related links and examples out there?
Update 2016: There is now a Google Closure package based on the Caja sanitizer.
It has a cleaner API, was rewritten to take into account APIs available on modern browsers, and interacts better with Closure Compiler.
Shameless plug: see caja/plugin/html-sanitizer.js for a client side html sanitizer that has been thoroughly reviewed.
It is white-listed, not black-listed, but the whitelists are configurable as per CajaWhitelists
If you want to remove all tags, then do the following:
var tagBody = '(?:[^"\'>]|"[^"]*"|\'[^\']*\')*';
var tagOrComment = new RegExp(
'<(?:'
// Comment body.
+ '!--(?:(?:-*[^->])*--+|-?)'
// Special "raw text" elements whose content should be elided.
+ '|script\\b' + tagBody + '>[\\s\\S]*?</script\\s*'
+ '|style\\b' + tagBody + '>[\\s\\S]*?</style\\s*'
// Regular name
+ '|/?[a-z]'
+ tagBody
+ ')>',
'gi');
function removeTags(html) {
var oldHtml;
do {
oldHtml = html;
html = html.replace(tagOrComment, '');
} while (html !== oldHtml);
return html.replace(/</g, '<');
}
People will tell you that you can create an element, and assign innerHTML
and then get the innerText
or textContent
, and then escape entities in that. Do not do that. It is vulnerable to XSS injection since <img src=bogus onerror=alert(1337)>
will run the onerror
handler even if the node is never attached to the DOM.
这篇关于在客户端清理/重写HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!