安全的XSS清洁功能(定期更新) [英] Secure XSS cleaning function (updated regularly)

查看:134
本文介绍了安全的XSS清洁功能(定期更新)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在网上搜寻了几天,试图找出答案,但是得到的答案却是矛盾的.

I've been hunting around the net now for a few days trying to figure this out but getting conflicting answers.

是否存在用于PHP的库,类或函数,可以针对XSS安全地对字符串进行清理/编码?需要定期更新它以应对新的攻击.<​​/strong>

Is there a library, class or function for PHP that securely sanitizes/encodes a string against XSS? It needs to be updated regularly to counter new attacks.

我有一些用例:

用例1):我有一个纯文本字段,例如名字或姓氏

Use case 1) I have a plain text field, say for a First Name or Last Name

  • 用户在字段中输入文本并提交表单
  • 在将其保存到数据库之前,我要 a)修剪掉前面的所有空白并 字符串的结尾,然后 b)从输入中剥离所有HTML标记.这是一个名称文本字段,他们不应包含任何HTML.
  • 然后,我将使用PDO准备好的语句将其保存到数据库中.
  • User enters text into field and submits the form
  • Before this is saved to the database I want to a) trim any whitespace off the front and end of the string, and b) strip all HTML tags from the input. It's a name text field, they shouldn't have any HTML in it.
  • Then I will save this to the database with PDO prepared statements.

我想我可以做trim()strip_tags()然后使用

I'm thinking I could just do trim() and strip_tags() then use a Sanitize Filter or RegEx with a whitelist of characters. Do they really need characters like ! and ? or < > in their name, not really.

用例2)当将内容从以前保存的数据库记录(或以前提交的表单)输出到View/HTML时,我想彻底清理它以用于XSS. 注意:,在用例1中可能已经通过过滤步骤,也可能没有,因为它可能是不同类型的输入,因此请假定未进行任何消毒.

Use case 2) When outputting the contents from a previously saved database record (or from a previously submitted form) to the View/HTML I want to thoroughly clean it for XSS. NB: It may or may not have gone through the filtering step in use case 1 as it could be a different type of input, so assume no sanitizing has been done.

起初我虽然可以使用HTMLPurifier来完成这项工作,但是当

Initially I though HTMLPurifier would do the job, but as it seems it is not what I need when I posed the question to their support:

这是石蕊测试:如果用户提交了<b>foo</b>,它应该显示为<b>foo</b>还是 foo ?如果是前者,则不需要HTML Purifier.

Here's the litmus test: if a user submits <b>foo</b> should it show up as <b>foo</b> or foo? If the former, you don't need HTML Purifier.

所以我宁愿它显示为<b>foo</b>,因为我不希望为简单的文本字段显示任何HTML或执行任何JavaScript.

So I'd rather it showed up as <b>foo</b> because I don't want any HTML displayed for a simple text field or any JavaScript executing.

所以我一直在寻找一个可以为我完成全部工作的功能.我偶然发现了Kohana 3.0使用的 xss_clean方法但是只有在您想要保留HTML的情况下.从Kohana 3.1开始不推荐使用,因为他们已将其替换为HTMLPurifier.因此,我猜测您应该执行HTML::chars()而不是此代码:

So I've been hunting around for a function that will do it all for me. I stumbled across the xss_clean method used by Kohana 3.0 which I'm guessing works but it's only if you want to keep the HTML. It's now deprecated from Kohana 3.1 as they've replaced it with HTMLPurifier. So I'm guessing you're supposed to do HTML::chars() instead which only does this code:

public static function chars($value, $double_encode = TRUE)
{
    return htmlspecialchars( (string) $value, ENT_QUOTES, Kohana::$charset, $double_encode);
}

现在,您显然应该使用 htmlentities 来代替<在Stack Overflow中的很多地方使用a href ="https://stackoverflow.com/questions/3623236/htmlspecialchars-vs-htmlentities-when-concerned-with-xss"> ,因为它比htmlspecialchars更安全.

Now apparently you're supposed to use htmlentities instead as mentioned in quite a few places in Stack Overflow because it's more secure than htmlspecialchars.

  • So how do I use htmlentities properly?
  • Is that all I need?
  • How does it protect against hex, decimal and base64 encoded values being sent from the attacks listed here?

现在,我看到htmlentities方法的第三个参数是要在转换中使用的字符集.现在我的站点/数据库位于UTF-8中,但是也许表单提交的数据不是UTF-8编码的,也许他们提交了ASCII或HEX,所以也许我需要先将其转换为UTF-8?那将意味着一些代码,例如:

Now I see that the 3rd parameter for the htmlentities method is the charset to be used in conversion. Now my site/db is in UTF-8, but perhaps the form submitted data was not UTF-8 encoded, maybe they submitted ASCII or HEX so maybe I need to convert it to UTF-8 first? That would mean some code like:

$encoding = mb_detect_encoding($input);
$input = mb_convert_encoding($input, 'UTF-8', $encoding);
$input = htmlentities($input, ENT_QUOTES, 'UTF-8');

是或否?然后我仍然不确定如何防止十六进制,十进制和base64可能的XSS输入...

Yes or no? Then I'm still not sure how to protect against the hex, decimal and base64 possible XSS inputs...

如果有一些库或开放源代码的PHP框架可以正确地进行XSS保护,那么我很想看看它们是如何在代码中实现的.

If there's some library or open source PHP framework that can do XSS protection properly I'd be interested to see how they do it in code.

非常感谢您的帮助,对于冗长的帖子!

Any help much appreciated, sorry for the long post!

推荐答案

要回答大胆的问题:是的.它被称为 htmlspecialchars .

To answer the bold question: Yes, there is. It's called htmlspecialchars.

它需要定期更新为 应对新的攻击.<​​/p>

It needs to be updated regularly to counter new attacks.

防止XSS攻击的正确方法不是在任何地方抵抗特定攻击,过滤/清理数据,而是正确编码.

The right way to prevent XSS attacks is not countering specific attacks, filtering/sanitizing data, but proper encoding, everywhere.

htmlspecialchars(或htmlentities)结合合理的字符编码决定(即UTF-8)和明确的字符编码说明足以防止所有XSS攻击.幸运的是,在没有显式编码的情况下调用htmlspecialchars(然后假定ISO-8859-1)也适用于UTF-8.如果要使之明确,请创建一个辅助函数:

htmlspecialchars (or htmlentities) in conjunction with a reasonable decision of character encoding (i.e. UTF-8) and explicit specification of character encoding is sufficient to prevent against all XSS attacks. Fortunately, calling htmlspecialchars without explicit encoding(it then assumes ISO-8859-1) happens to work out for UTF-8, too. If you want to make that explicit, create a helper function:

// Don't forget to specify UTF-8 as the document's encoding
function htmlEncode($s) {
    return htmlspecialchars($s, ENT_QUOTES, 'UTF-8');
}

哦,要解决表单问题:不要尝试检测编码,它一定会失败.而是以UTF-8形式给出表格.然后,每个浏览器都会以UTF-8发送用户输入.

Oh, and to address the form worries: Don't try to detect encodings, it's bound to fail. Instead, give out the form in UTF-8. Every browser will send user inputs in UTF-8 then.

(...)您应该使用 htmlentities,因为htmlspecialchars 容易受到UTF-7 XSS攻击.

(...) you're supposed to use htmlentities because htmlspecialchars is vulnerable to UTF-7 XSS exploit.

仅当浏览器认为文档以UTF-7编码时,才可以应用UTF-7 XSS利用.将文档编码指定为UTF-8(在<head>之后的HTTP标头/元标记中)可以防止这种情况.

The UTF-7 XSS exploit can only be applied if the browser thinks a document is encoded in UTF-7. Specifying the document encoding as UTF-8 (in the HTTP header/a meta tag right after <head>) prevents this.

如果我没有检测到编码, 阻止攻击者下载的方法 html文件,然后将其更改为 UTF-7或其他某种编码,然后 将POST请求提交回我的 更改后的html页面访问服务器?

Also if I don't detect the encoding, what's to stop an attacker downloading the html file, then altering it to UTF-7 or some other encoding, then submitting the POST request back to my server from the altered html page?

此攻击情形不必要地复杂.攻击者只需制作一个UTF-7字符串,而无需下载任何内容.

This attack scenario is unnecessarily complex. The attacker could just craft a UTF-7 string, no need to download anything.

如果您接受攻击者的POST(即您接受匿名的公共用户输入),则您的服务器只会将UTF-7字符串解释为一个奇怪的UTF-8字符串.没问题,攻击者的帖子只会显示乱码.攻击者可以通过提交"grfnlk"一百次来达到相同的效果(发送奇怪的文本).

If you accept the attacker's POST (i.e. you're accepting anonymous public user input), your server will just interpret the UTF-7 string as a weird UTF-8 one. That is not a problem, the attacker's post will just show garbled. The attacker could achieve the same effect (sending strange text) by submitting "grfnlk" a hundred times.

如果我的方法仅适用于UTF-8,那么XSS攻击将会成功,不是吗?

If my method only works for UTF-8 then the XSS attack will get through, no?

不,不会.编码不是魔术.编码只是解释二进制字符串的一种方式.例如,字符串ö"在UTF-7中编码为(十六进制)2B 41 50 59(在UTF-8中编码为C3 B6).将2B 41 50 59解码为UTF-8会产生"+ APY",这是无害的看似随机的字符.

No, it won't. Encodings are not magic. An encoding is just a way to interpret a binary string. For example, the string "ö" is encoded as (hexadecimal) 2B 41 50 59 in UTF-7 (and C3 B6 in UTF-8). Decoding 2B 41 50 59 as UTF-8 yields "+APY" - harmless, seemingly randomly characters.

htmlentities还如何防止HEX或其他XSS攻击?

Also how does htmlentities protect against HEX or other XSS attacks?

仅输出十六进制数据.发送"3C"的攻击者将发布消息"3C".如果主动尝试以其他方式解释十六进制输入,例如将它们主动映射为unicode代码点,然后输出,则"<3C"只能变为<.这只是意味着,如果您要接受的格式不是普通的UTF-8(例如,以base32编码的UTF-8),则必须首先解压缩编码,然后然后使用htmlspecialchars在将其包含在HTML代码之间之前.

Hexadecimal data will be outputted as just that. An attacker sending "3C" will post a message "3C". "3C" can only become < if you actively try to interpret hexadecimal inputs otherwise, for example actively map them into unicode code points and then output them. That just means if you're accepting data in something but plain UTF-8 (for example base32-encoded UTF-8), you'll first have to unpack your encoding, and then use htmlspecialchars before including it between HTML code.

这篇关于安全的XSS清洁功能(定期更新)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆