抓住XSS(跨站脚本)攻击(Java)的最佳正则表达式? [英] Best regex to catch XSS (Cross-site Scripting) attack (in Java)?

查看:3882
本文介绍了抓住XSS(跨站脚本)攻击(Java)的最佳正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Jeff实际上在清理HTML 中发布了这个消息。但他的例子是在C#中,我对Java版本更感兴趣。有没有人有更好的Java版本?他的例子足够好,直接从C#转换为Java?



[更新]我在这个问题上付出了代价,因为当我问这个问题的时候,SO是不是很受欢迎(*)。至于与安全相关的任何事情,越多的人看待它,它就越好!

(*)实际上,我认为它仍然处于封闭测试阶段

解决方案

不要使用正则表达式来做到这一点。请记住,您并不仅仅保护有效的HTML;您可以防止Web浏览器创建的DOM。浏览器可以很容易地诱骗从无效的HTML生成有效的DOM。



例如,请参阅混淆的XSS攻击。您是否准备定制一个正则表达式来防止对 Yahoo和Hotmail 在IE6 / 7/8上?

 < HTML>< BODY> 
<?xml:namespace prefix =tns =urn:schemas-microsoft-com:time>
<?import namespace =timplementation =#default#time2>
< t:set attributeName =innerHTMLto =XSS& lt; SCRIPT DEFER& gt; alert(& quot; XSS& quot;)& lt; / SCRIPT& gt;>>
< / BODY>< / HTML>

这个攻击对IE6有效吗?

 < TABLE BACKGROUND =javascript:alert('XSS')> 

未在本网站上列出的攻击如何?声称杰夫的方法存在的问题是,它不是白名单。作为该页面上的某人熟练地注释:


问题在于,html
必须是干净的。有些情况下,你可以通过黑客入侵的HTML传递
,而
不会匹配它,在这种情况下,
将返回被黑客入侵的html字符串,因为它
won'不匹配任何东西来替换。这
不是严格白名单。

我会建议一个特定的工具,如 AntiSamy 。它的工作原理是实际解析HTML,然后遍历DOM并删除不在可配置白名单中的任何内容。主要区别在于能够优雅地处理格式错误的HTML。

最好的部分是,它实际上是针对上述网站上的所有XSS攻击进行单元测试。此外,有什么比这个API调用更容易:

  public String toSafeHtml(String html)throws ScanException,PolicyException {

策略策略= Policy.getInstance(POLICY_FILE);
AntiSamy antiSamy =新AntiSamy();
CleanResults cleanResults = antiSamy.scan(html,policy);
返回cleanResults.getCleanHTML()。trim();
}


Jeff actually posted about this in Sanitize HTML. But his example is in C# and I'm actually more interested in a Java version. Does anyone have a better version for Java? Is his example good enough to just convert directly from C# to Java?

[Update] I have put a bounty on this question because SO wasn't as popular when I asked the question as it is today (*). As for anything related to security, the more people look into it, the better it is!

(*) In fact, I think it was still in closed beta

解决方案

Don't do this with regular expressions. Remember, you're not protecting just against valid HTML; you're protecting against the DOM that web browsers create. Browsers can be tricked into producing valid DOM from invalid HTML quite easily.

For example, see this list of obfuscated XSS attacks. Are you prepared to tailor a regex to prevent this real world attack on Yahoo and Hotmail on IE6/7/8?

<HTML><BODY>
<?xml:namespace prefix="t" ns="urn:schemas-microsoft-com:time">
<?import namespace="t" implementation="#default#time2">
<t:set attributeName="innerHTML" to="XSS&lt;SCRIPT DEFER&gt;alert(&quot;XSS&quot;)&lt;/SCRIPT&gt;">
</BODY></HTML>

How about this attack that works on IE6?

<TABLE BACKGROUND="javascript:alert('XSS')">

How about attacks that are not listed on this site? The problem with Jeff's approach is that it's not a whitelist, as claimed. As someone on that page adeptly notes:

The problem with it, is that the html must be clean. There are cases where you can pass in hacked html, and it won't match it, in which case it'll return the hacked html string as it won't match anything to replace. This isn't strictly whitelisting.

I would suggest a purpose built tool like AntiSamy. It works by actually parsing the HTML, and then traversing the DOM and removing anything that's not in the configurable whitelist. The major difference is the ability to gracefully handle malformed HTML.

The best part is that it actually unit tests for all the XSS attacks on the above site. Besides, what could be easier than this API call:

public String toSafeHtml(String html) throws ScanException, PolicyException {

    Policy policy = Policy.getInstance(POLICY_FILE);
    AntiSamy antiSamy = new AntiSamy();
    CleanResults cleanResults = antiSamy.scan(html, policy);
    return cleanResults.getCleanHTML().trim();
}

这篇关于抓住XSS(跨站脚本)攻击(Java)的最佳正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆