在客户端清理/重写HTML [英] Sanitize/Rewrite HTML on the Client Side

查看:80
本文介绍了在客户端清理/重写HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要显示通过跨域请求加载的外部资源,并确保只显示 safe 内容。

可以使用Prototype的 String#stripScripts < a>删除脚本块。但是像 onclick onerror 这样的处理程序仍然存在。



是否有任何库至少可以


  • 条带块,
  • 杀死DOM处理程序,

  • 删除黑名单标签(例如: embed object )。



那么是否有任何与JavaScript相关的链接和例子? 解决方案

2016年更新:现在有一个软件包。



它有一个更简洁的API,在考虑可用API现代浏览器,并与Closure Compiler更好地交互。






无耻插件:请参阅 caja / plugin / h tml-sanitizer.js 为已经彻底审查的客户端html消毒器。



它是白名单,不是黑名单,但白名单可根据 CajaWhitelists






如果您想删除所有标签,请执行以下操作:

  var tagBody ='(?:[^\'>] |[^] *| \'[^ \'] * \')*'; 

var tagOrComment = new RegExp(
'<(?:'
// Comment body。
+'! - (?:(?: - * [^ - >])* - + | - ?)'
//内容应该被忽略的特殊原始文本元素。 tagBody +'> [\\s\\S] *?< / script\\s *'
+'| style \\''+ tagBody +'> [\\s\\S] *?< / style\\\s *'
//常规名称
+'| /?[az]'
+ tagBody
+')>',
'gi');
函数removeTags(html){
var oldHtml;
do {
oldHtml = html;
html = html.replace(tagOrComment,'');
} while(html!== oldHtml);
返回html.replace(/< / g,'& lt;');
}

人们会告诉你可以创建一个元素,并将 innerHTML ,然后获取 innerText textContent ,然后转义实体那。不要那样做。自< img src = bogus onerror = alert(1337)> 将运行 onerror 处理程序,即使节点从未连接到DOM。


I need to display external resources loaded via cross domain requests and make sure to only display "safe" content.

Could use Prototype's String#stripScripts to remove script blocks. But handlers such as onclick or onerror are still there.

Is there any library which can at least

  • strip script blocks,
  • kill DOM handlers,
  • remove black listed tags (eg: embed or object).

So are any JavaScript related links and examples out there?

解决方案

Update 2016: There is now a Google Closure package based on the Caja sanitizer.

It has a cleaner API, was rewritten to take into account APIs available on modern browsers, and interacts better with Closure Compiler.


Shameless plug: see caja/plugin/html-sanitizer.js for a client side html sanitizer that has been thoroughly reviewed.

It is white-listed, not black-listed, but the whitelists are configurable as per CajaWhitelists


If you want to remove all tags, then do the following:

var tagBody = '(?:[^"\'>]|"[^"]*"|\'[^\']*\')*';

var tagOrComment = new RegExp(
    '<(?:'
    // Comment body.
    + '!--(?:(?:-*[^->])*--+|-?)'
    // Special "raw text" elements whose content should be elided.
    + '|script\\b' + tagBody + '>[\\s\\S]*?</script\\s*'
    + '|style\\b' + tagBody + '>[\\s\\S]*?</style\\s*'
    // Regular name
    + '|/?[a-z]'
    + tagBody
    + ')>',
    'gi');
function removeTags(html) {
  var oldHtml;
  do {
    oldHtml = html;
    html = html.replace(tagOrComment, '');
  } while (html !== oldHtml);
  return html.replace(/</g, '&lt;');
}

People will tell you that you can create an element, and assign innerHTML and then get the innerText or textContent, and then escape entities in that. Do not do that. It is vulnerable to XSS injection since <img src=bogus onerror=alert(1337)> will run the onerror handler even if the node is never attached to the DOM.

这篇关于在客户端清理/重写HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆