如果输入长度不能被3整除,为什么base64编码需要填充? [英] Why does base64 encoding require padding if the input length is not divisible by 3?

查看:179
本文介绍了如果输入长度不能被3整除,为什么base64编码需要填充?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

base64编码中填充的目的是什么.以下是维基百科的摘录:

What is the purpose of padding in base64 encoding. The following is the extract from wikipedia:

分配了一个附加的填充字符,可用于将编码的输出强制为4个字符的整数倍(或等效地,当未编码的二进制文本不是3个字节的倍数时);然后必须丢弃这些填充字符当解码但仍允许计算未编码文本的有效长度时,当输入的二进制长度不是3字节的倍数时(通常对最后一个非填充字符进行编码,以便它代表的最后6位块)将在其最低有效位上进行零填充,在编码流的末尾最多可能会出现两个填充字符."

"An additional pad character is allocated which may be used to force the encoded output into an integer multiple of 4 characters (or equivalently when the unencoded binary text is not a multiple of 3 bytes) ; these padding characters must then be discarded when decoding but still allow the calculation of the effective length of the unencoded text, when its input binary length would not be not a multiple of 3 bytes (the last non-pad character is normally encoded so that the last 6-bit block it represents will be zero-padded on its least significant bits, at most two pad characters may occur at the end of the encoded stream)."

我编写了一个程序,该程序可以对任何字符串进行base64编码,并可以对任何base64编码的字符串进行解码.填充可以解决什么问题?

I wrote a program which could base64 encode any string and decode any base64 encoded string. What problem does padding solves?

推荐答案

您认为不需要填充的结论是正确的.始终可以从编码序列的长度中明确确定输入的长度.

Your conclusion that padding is unnecessary is right. It's always possible to determine the length of the input unambiguously from the length of the encoded sequence.

但是,在连接base64编码的字符串的方式中,例如在非常简单的网络协议中可能会丢失单个序列的长度的情况下,填充是有用的.

However, padding is useful in situations where base64 encoded strings are concatenated in such a way that the lengths of the individual sequences are lost, as might happen, for example, in a very simple network protocol.

如果连接了 unpapped 字符串,则不可能恢复原始数据,因为有关每个单独序列末尾的奇数字节数的信息会丢失.但是,如果使用填充序列,则不会有歧义,并且整个序列可以正确解码.

If unpadded strings are concatenated, it's impossible to recover the original data because information about the number of odd bytes at the end of each individual sequence is lost. However, if padded sequences are used, there's no ambiguity, and the sequence as a whole can be decoded correctly.

假设我们有一个程序可以对单词进行base64编码,然后将它们连接起来并通过网络发送.它对"I","AM"和"TJM"进行编码,将结果夹在一起而不进行填充,然后进行传输.

Suppose we have a program that base64-encodes words, concatenates them and sends them over a network. It encodes "I", "AM" and "TJM", sandwiches the results together without padding and transmits them.

  • I编码为SQ(带有填充的SQ==)
  • AM编码为QU0(带有填充的QU0=)
  • TJM编码为VEpN(带有填充的VEpN)
  • I encodes to SQ (SQ== with padding)
  • AM encodes to QU0 (QU0= with padding)
  • TJM encodes to VEpN (VEpN with padding)

因此,传输的数据为SQQU0VEpN.接收器base64将其解码为I\x04\x14\xd1Q)而不是预期的IAMTJM.结果是无稽之谈,因为发件人具有关于每个单词在编码序列中结尾的位置的销毁信息.如果发送方发送的是SQ==QU0=VEpN,则接收方可能已将其解码为三个单独的base64序列,这些序列会串联起来给出IAMTJM.

So the transmitted data is SQQU0VEpN. The receiver base64-decodes this as I\x04\x14\xd1Q) instead of the intended IAMTJM. The result is nonsense because the sender has destroyed information about where each word ends in the encoded sequence. If the sender had sent SQ==QU0=VEpN instead, the receiver could have decoded this as three separate base64 sequences which would concatenate to give IAMTJM.

为什么不仅仅设计协议为每个单词加一个整数长度呢?这样,接收器就可以正确解码流,并且无需填充.

Why not just design the protocol to prefix each word with an integer length? Then the receiver could decode the stream correctly and there would be no need for padding.

这是一个好主意,只要我们知道要编码的数据的长度,然后再开始对其进行编码即可.但是,如果我们是用实时摄像机编码的视频片段而不是文字,该怎么办?我们可能事先不知道每个块的长度.

That's a great idea, as long as we know the length of the data we're encoding before we start encoding it. But what if, instead of words, we were encoding chunks of video from a live camera? We might not know the length of each chunk in advance.

如果协议使用填充,则完全不需要传输长度.数据可以从相机传入时进行编码,每个块都以填充结尾,接收器将能够正确解码流.

If the protocol used padding, there would be no need to transmit a length at all. The data could be encoded as it came in from the camera, each chunk terminated with padding, and the receiver would be able to decode the stream correctly.

显然,这是一个非常人为的示例,但也许可以说明为什么在某些情况下填充可能会有所帮助.

Obviously that's a very contrived example, but perhaps it illustrates why padding might conceivably be helpful in some situations.

这篇关于如果输入长度不能被3整除,为什么base64编码需要填充?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆