PHP安全:如何将编码误用? [英] PHP Security: how can encoding be misused?

查看:107
本文介绍了PHP安全:如何将编码误用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从这个优秀的 UTF-8一路通过问题,我读了这个:

From this excellent "UTF-8 all the way through" question, I read about this:


不幸的是,你应该验证每个提交的字符串是有效的
UTF-8,然后再尝试存储或在任何地方使用它。 PHP的
mb_check_encoding()做的伎俩,但你必须使用它
宗教。真的没有办法解决这个问题,因为恶意客户端
可以以任何需要的编码提交数据,我还没有找到一个
的技巧让PHP可靠地执行此操作。

Unfortunately, you should verify every submitted string as being valid UTF-8 before you try to store it or use it anywhere. PHP's mb_check_encoding() does the trick, but you have to use it religiously. There's really no way around this, as malicious clients can submit data in whatever encoding they want, and I haven't found a trick to get PHP to do this for you reliably.

现在,我还在学习编码的怪癖,我想了解什么恶意客户端可以做什么来滥用编码。可以实现什么?有人可以举个例子吗?假设我将用户输入保存到MySQL数据库中,或者我通过电子邮件发送,如果不使用 mb_check_encoding 功能,用户如何创建危害?

Now, I'm still learning the quirks of encoding, and I'd like to know exactly what malicious clients can do to abuse encoding. What can one achieve? Can somebody give an example? Let's say I save the user input into a MySQL database, or I send it through e-mail, how can a user create harm if I do not use the mb_check_encoding functionality?

推荐答案


如果我不使用mb_check_encoding功能,用户如何创建伤害? >

how can a user create harm if I do not use the mb_check_encoding functionality?

这是关于超重编码

由于UTF-8设计不幸的奇怪,可以使字节序列,如果用一个天真的位打包解码器解析,导致与较短的字节序列相同的字符 - 包括单个ASCII字符。

Due to an unfortunate quirk of UTF-8 design, it is possible to make byte sequences that, if parsed with a naïve bit-packing decoder, would result in the same character as a shorter sequence of bytes - including a single ASCII character.

例如,字符 通常表示为字节0x3C,但也可以使用超长的UTF-8序列0xC0 0xBC(或甚至更多的冗余3或4字节序列)来表示。

For example the character < is usually represented as byte 0x3C, but could also be represented using the overlong UTF-8 sequence 0xC0 0xBC (or even more redundant 3- or 4-byte sequences).

如果您采用此输入并处理它在一个基于Unicode的基于字节的工具中,则可以避免在该工具中使用的任何字符处理步骤。规范的例子是将0x80 0xBC提交给具有本机字节串的PHP。 htmlspecialchars 对字符 进行HTML编码的典型用法将在此失败,因为预期的字节序列0x3C不是当下。因此,脚本的输出仍然包括超长编码的 ,任何浏览器读取输出可能会读取序列0x80 0xBC 0x73 0x63 0x72 0x69 0x70 0x74为< script 和hey presto! XSS。

If you take this input and handle it in a Unicode-oblivious byte-based tool, then any character processing step being used in that tool may be evaded. The canonical example would be submitting 0x80 0xBC to PHP, which has native byte strings. The typical use of htmlspecialchars to HTML-encode the character < would fail here because the expected byte sequence 0x3C is not present. So the output of the script would still include the overlong-encoded <, and any browser reading that output could potentially read the sequence 0x80 0xBC 0x73 0x63 0x72 0x69 0x70 0x74 as <script and hey presto! XSS.

自从回溯以来,重播已经被禁止,现代浏览器不再允许。但这是IE和Opera很长时间的一个真正的问题,并不能保证每一个浏览器都将在未来得到正确的答案。当然,这只是一个例子 - 任何一个面向字节的工具处理Unicode字符串的地方都有类似的问题。因此,最好的方法是在最早的输入阶段删除所有的超长。

Overlongs have been banned since way back and modern browsers no longer permit them. But this was a genuine problem for IE and Opera for a long time, and there's no guarantee every browser is going to get it right in future. And of course this is only one example - any place where a byte-oriented tool processes Unicode strings you've potentially got similar problems. The best approach, therefore, is to remove all overlongs at the earliest input phase.

这篇关于PHP安全:如何将编码误用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆