是否有可能与base64编码/解码发生冲突 [英] Is it possible to get collisions with base64 Encoding / Decoding

查看:354
本文介绍了是否有可能与base64编码/解码发生冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里提出了类似的问题: base64编码始终是一对一的

A similar question was asked here: Is base64 encoding always one to one

显然,(类似问题的答案)是.我已经知道了,但是我很想知道为什么这两个字符串在经过Base64解码后看起来是等效的解释:

And apparently the answer (to the similar question) is YES. I already know that, BUT I'd be curious to know the explanation for why these two strings appear to be equivalent after being Base64 decoded:

cwB0AGQAAG ==

cwB0AGQAAG==

cwB0AGQAAA ==

cwB0AGQAAA==

还有一件事情……当您选择解码后的字符串然后重新编码时,两者都重新编码为相同的值:cwB0AGQAAA ==

One more thing... when you select the de-coded string then recode, both re-encode to the same value: cwB0AGQAAA==

发生了什么事?

推荐答案

base64不是一对一的;编码相同字节的方法有多种.您将看到多种编码字符串末尾填充的方法.

base64 is not one-to-one; there are multiple ways to encode the same bytes. What you're seeing is multiple ways to encode the padding at the end of the string.

base64将字节(每个8位)编码为base64.base64中的一个字符编码6位,因此四个base64字符可以处理三个字节.当输入的长度不是三个字节的倍数时,base64使用=作为填充字符来填充最后一组四个base64字符. XXX=指示仅使用组的前两个字节(其中XXX代表三个任意的base64字符),而XX==指示仅应使用第一个字节.

base64 encodes bytes (8 bits each) into base 64. A character in base64 encodes 6 bits, so four base64 characters can handle three bytes. When the length of the input is not a multiple of three bytes, base64 uses = as a padding character to fill up the last group of four base64 characters. XXX= indicates that only the first two bytes of the group are to be used (where XXX represents three arbitrary base64 characters), while XX== indicates that only the first byte should be used.

示例中的最后一组是AA==,它编码为0字节.但是,AA部分可以编码12位,其中最低有效四位在解码时被忽略,因此您可以使用A-P中的任何字符并获得相同的结果.使用编码器时,它始终会为这四个位选择零,因此您会返回AA==.

The last group in your example is AA==, which encodes a 0 byte. However, the AA part can encode 12 bits, of which the least significant four are ignored on decoding, so you can use any character from A-P and get the same result. When you use the encoder it always picks zeros for those four bits, so you get back AA==.

在base64中,填充实际上甚至更加复杂.从技术上讲,您可以排除=字符;字符串的长度将指示它们的缺失(根据Wikipedia,并非所有解码器都支持).填充有用的地方是,它允许安全地连接base64字符串,因为每四个一组的解释都是相同的.但是,这意味着填充也可以出现在字符串的中间,这意味着可以以各种方式对字节序列进行编码.您还可以包含空格或换行符,这些空格或换行符都将被忽略.

Padding is actually even more complicated in base64. Technically you can exclude the = characters; the length of the string will indicate their absence (according to Wikipedia, not all decoders support this). Where padding is useful is that it allows base64 strings to be safely concatenated, since every group of four is interpreted the same way. However, this means that padding can also appear in the middle of a string, which means a sequence of bytes can be encoded in all sorts of ways. You can also include whitespace or newlines, which are all ignored.

尽管所有这些,base64仍然是单射的,这意味着如果x!= y,则base64(x)!= base64(y);结果,您不会发生冲突,并且始终可以获取原始数据.但是,base64并不是唯一的:编码相同数据的方法有很多.

Despite all of this, base64 is still injective, meaning if x != y, then base64(x) != base64(y); as a result, you cannot get collisions and can always get the original data back. However, base64 is not surjective: there are many ways of encoding the same data.

这篇关于是否有可能与base64编码/解码发生冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆