这组正则表达式是否完全可以防止跨站点脚本? [英] Does this set of regular expressions FULLY protect against cross site scripting?

查看:52
本文介绍了这组正则表达式是否完全可以防止跨站点脚本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码无法捕获的危险示例有哪些?

What's an example of something dangerous that would not be caught by the code below?

在一些评论之后,我添加了另一行,在下面评论.请参阅 David Grant 的回答中 Vinko 的评论.到目前为止,只有 Vinko 回答了这个问题,该问题要求提供可以忽略此功能的具体示例.Vinko 提供了一个,但我已经编辑了代码以关闭该漏洞.如果你们中的另一个人能想到另一个具体的例子,你会得到我的投票!

After some of the comments I added another line, commented below. See Vinko's comment in David Grant's answer. So far only Vinko has answered the question, which asks for specific examples that would slip through this function. Vinko provided one, but I've edited the code to close that hole. If another of you can think of another specific example, you'll have my vote!

public static string strip_dangerous_tags(string text_with_tags)
{
    string s = Regex.Replace(text_with_tags, @"<script", "<scrSAFEipt", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"</script", "</scrSAFEipt", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"<object", "</objSAFEct", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"</object", "</obSAFEct", RegexOptions.IgnoreCase);
    // ADDED AFTER THIS QUESTION WAS POSTED
    s = Regex.Replace(s, @"javascript", "javaSAFEscript", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onabort", "onSAFEabort", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onblur", "onSAFEblur", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onchange", "onSAFEchange", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onclick", "onSAFEclick", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"ondblclick", "onSAFEdblclick", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onerror", "onSAFEerror", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onfocus", "onSAFEfocus", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onkeydown", "onSAFEkeydown", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onkeypress", "onSAFEkeypress", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onkeyup", "onSAFEkeyup", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onload", "onSAFEload", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onmousedown", "onSAFEmousedown", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmousemove", "onSAFEmousemove", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseout", "onSAFEmouseout", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseup", "onSAFEmouseup", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseup", "onSAFEmouseup", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onreset", "onSAFEresetK", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onresize", "onSAFEresize", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onselect", "onSAFEselect", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onsubmit", "onSAFEsubmit", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onunload", "onSAFEunload", RegexOptions.IgnoreCase);

    return s;
}

推荐答案

永远不够——白名单,不要黑名单

例如 javascript: 伪 URL 可以被 HTML 实体混淆,你已经忘记了 <embed> 并且有危险的 CSS 属性,比如 IE 中的行为expression.

It's never enough – whitelist, don't blacklist

For example javascript: pseudo-URL can be obfuscated with HTML entities, you've forgotten about <embed> and there are dangerous CSS properties like behavior and expression in IE.

无数方法来逃避过滤器,但这种方法注定会失败.即使您今天发现并阻止了所有可能的漏洞利用,将来也可能会添加新的不安全元素和属性.

There are countless ways to evade filters and such approach is bound to fail. Even if you find and block all exploits possible today, new unsafe elements and attributes may be added in the future.

保护 HTML 的好方法只有两种:

There are only two good ways to secure HTML:

  • 通过将每个 < 替换为 &lt; 将其转换为文本.
    如果您想允许用户输入格式化文本,您可以使用自己的标记(例如像 SO 那样的降价).

  • convert it to text by replacing every < with &lt;.
    If you want to allow users enter formatted text, you can use your own markup (e.g. markdown like SO does).

将 HTML 解析为 DOM,检查每个元素和属性并删除所有未列入白名单的内容.
您还需要检查允许属性的内容,例如 href(确保 URL 使用安全协议,阻止所有未知协议).
一旦您清理了 DOM,就可以从中生成新的、有效的 HTML.切勿像处理文本一样处理 HTML,因为无效的标记、注释、实体等很容易欺骗您的过滤器.

parse HTML into DOM, check every element and attribute and remove everything that is not whitelisted.
You will also need to check contents of allowed attributes like href (make sure that URLs use safe protocol, block all unknown protocols).
Once you've cleaned up the DOM, generate new, valid HTML from it. Never work on HTML as if it was text, because invalid markup, comments, entities, etc. can easily fool your filter.

还要确保您的页面声明其编码,因为存在利用浏览器自动检测错误编码的漏洞.

Also make sure your page declares its encoding, because there are exploits that take advantage of browsers auto-detecting wrong encoding.

这篇关于这组正则表达式是否完全可以防止跨站点脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆