从请求中接受预定义的无害HTML标记集合有多安全? [英] How safe is it to accept a pre-defined set of non-harmful HTML tags from a request?

查看:103
本文介绍了从请求中接受预定义的无害HTML标记集合有多安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我作为一名Web开发人员学到的第一件事就是永远不会接受客户端提供的任何HTML。 (也许只有当我对HTML进行编码时)。

我使用输出HTML的WYSIWYG编辑器(TinyMCE)。到目前为止,我只在管理页面上使用它,但现在我想在论坛上使用它。它有一个BBCode模块,但似乎不完整。 (BBCode本身可能不支持我想要的所有内容。)



所以,这是我的想法:

我允许客户端直接发布一些HTML代码。然后,我检查代码的完整性(格式良好),并根据预先定义的一组允许的标签和样式删除所有不允许的标签,属性和CSS规则。

显然我会允许我使用的TinyMCE功能子集输出的东西。



我会允许以下标签:

span sub sup a p ul ol li img strong em br





style (for everything), href title (for a ), alt src (for img

以下CSS规则:

color font font-size font-weight font-style 文字修饰



这些涵盖了格式化所需的一切,并且(据我所知)不会带来任何安全风险。基本上,格式良好的执行和缺乏任何布局风格可以防止任何人伤害网站的布局。禁止脚本标记和类似的东西阻止XSS。

(一个例外:也许我应该允许 width / height 在图像的预定义范围内。)



其他优点:这个东西可以帮我省去写/寻找BBCode-Html转换器。



你认为?

这是一件安全的事情吗?



(正如我所看到的,StackOverflow还允许在关于我字段中使用一些基本的HTML,所以我认为我不是第一个实现这一点的人。)



编辑:

我发现这个答案,这解释了如何相当容易地做到这一点。

当然,没有人应该考虑使用正则表达式

问题i tself与任何语言或技术都无关,但如果您想知道,我可以在ASP.NET中编写此应用程序。 目前还不清楚你使用或喜欢哪种编程语言,但在Java中有 Jsoup ,这是一个非常漂亮的HTML解析器API,其中包含基于可定制的HTML标签和属性的白名单的HTML清理器(不幸的是没有CSS规则,因为它完全超出了HTML解析器的范围)。以下是其网站的相关摘要。


清理不受信任的HTML a>



问题



您希望允许不受信任的用户在您的网站上提供HTML输出(例如,评论提交)。您需要清理此HTML以避免跨站点脚本(XSS)攻击。



解决方案



使用jsoup HTML Cleaner ,其中 Whitelist

 字符串不安全= 
< p>< a href ='http://example.com/'onclick ='stealCookies()'> ; LINK< / A>< / p>中;
字符串安全= Jsoup.clean(不安全,Whitelist.basic());
// now:< p>< a href =http://example.com/ =nofollow>连结< / a>< / p>


白名单 类本身包含几个可能有用的预定义白名单,像 Whitelist#basic() Whitelist#relaxed()



对于.NET,Jsoup端口名称 NSoup


One of the first things I learned as a web developer was to never ever accept any HTML from the client. (Perhaps only if I HTML encode it.)
I use a WYSIWYG editor (TinyMCE) that outputs HTML. So far I have only used it on an admin page, but now I'd like to also use it on a forum. It has a BBCode module, but that seems to be incomplete. (It is possible that BBCode itself doesn't support everything I want it to.)

So, here's my idea:

I allow the client to directly POST some HTML code. Then, I check the code for sanity (well-formedness) and remove all tags, attributes, and CSS rules that are not allowed based on a pre-defined set of allowed tags and styles.
Obviously I would allow the stuff that can be outputted by the subset of TinyMCE functionality I use.

I would allow the following tags:
span, sub, sup, a, p, ul, ol, li, img, strong, em, br

With the following attributes:
style (for everything), href and title (for a), alt and src (for img)

And the following CSS rules:
color, font, font-size, font-weight, font-style, text-decoration

These cover everything that I need for formatting, and (as far as I know) don't present any security risk. Basically, the enforcement of well-formedness and the lack of any layouting styles prevent anyone to hurt the layout of the site. The disallow of the script tag and the likes prevent XSS.
(One exception: maybe I should allow width/height in a predefined range for images.)

Other advantage: this stuff would save me from the need to write / look for a BBCode-Html converter.

What do you think?
Is this a secure thing to do?

(As I see, StackOverflow also allows some basic HTML in the "About Me" field, so I think I'm not the first one to implement this.)

EDIT:

I found this answer which explains how to do this fairly easily.
And of course, noone should think about using regex for this.

The question itself is not related to any language or technology, but if you are wondering, I write this application in ASP.NET.

解决方案

It's unclear what programming language you're using or are preferring, but in Java there's Jsoup, which is a pretty slick HTML parser API which contains among others a HTML cleaner based on a customizable whitelist of HTML tags and attributes (unfortunately no CSS rules since that's completely out the scope of a HTML parser). Here's an extract of relevance from its site.

Sanitize untrusted HTML

Problem

You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.

Solution

Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.

String unsafe = 
      "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
      // now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>

The Whitelist class itself contains several predefinied whitelists which may be of use, like Whitelist#basic() and Whitelist#relaxed().

For .NET, there's by the way a Jsoup port with the name NSoup

这篇关于从请求中接受预定义的无害HTML标记集合有多安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆