如果输入长度不能被 3 整除,为什么 base64 编码需要填充? [英] Why does base64 encoding require padding if the input length is not divisible by 3?

查看:27
本文介绍了如果输入长度不能被 3 整除,为什么 base64 编码需要填充?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

base64 编码中填充的目的是什么.以下是维基百科的摘录:

What is the purpose of padding in base64 encoding. The following is the extract from wikipedia:

"分配了一个额外的填充字符,可用于强制编码输出为 4 个字符的整数倍(或等效地,当未编码的二进制文本不是 3 个字节的倍数时);然后必须丢弃这些填充字符当解码但仍然允许计算未编码文本的有效长度时,当它的输入二进制长度不是 3 个字节的倍数时(最后一个非填充字符通常被编码,以便它代表的最后 6 位块将在其最低有效位上进行零填充,在编码流的末尾最多可能出现两个填充字符)."

"An additional pad character is allocated which may be used to force the encoded output into an integer multiple of 4 characters (or equivalently when the unencoded binary text is not a multiple of 3 bytes) ; these padding characters must then be discarded when decoding but still allow the calculation of the effective length of the unencoded text, when its input binary length would not be not a multiple of 3 bytes (the last non-pad character is normally encoded so that the last 6-bit block it represents will be zero-padded on its least significant bits, at most two pad characters may occur at the end of the encoded stream)."

我编写了一个程序,它可以对任何字符串进行 base64 编码并解码任何 base64 编码的字符串.padding 解决了什么问题?

I wrote a program which could base64 encode any string and decode any base64 encoded string. What problem does padding solves?

推荐答案

您认为不需要填充的结论是正确的.从编码序列的长度可以明确地确定输入的长度.

Your conclusion that padding is unnecessary is right. It's always possible to determine the length of the input unambiguously from the length of the encoded sequence.

但是,填充在 base64 编码字符串以丢失单个序列长度的方式连接的情况下很有用,例如,在非常简单的网络协议中可能会发生这种情况.

However, padding is useful in situations where base64 encoded strings are concatenated in such a way that the lengths of the individual sequences are lost, as might happen, for example, in a very simple network protocol.

如果unpadded 字符串被连接起来,就不可能恢复原始数据,因为关于每个单独序列末尾的奇数字节数的信息都丢失了.但是,如果使用填充序列,则没有歧义,并且可以正确解码整个序列.

If unpadded strings are concatenated, it's impossible to recover the original data because information about the number of odd bytes at the end of each individual sequence is lost. However, if padded sequences are used, there's no ambiguity, and the sequence as a whole can be decoded correctly.

假设我们有一个程序,它对单词进行 base64 编码,将它们连接起来并通过网络发送它们.它对I"、AM"和TJM"进行编码,将结果无填充地夹在中间并传输.

Suppose we have a program that base64-encodes words, concatenates them and sends them over a network. It encodes "I", "AM" and "TJM", sandwiches the results together without padding and transmits them.

  • I 编码为 SQ (SQ== with padding)
  • AM 编码为 QU0(QU0= 带填充)
  • TJM 编码为 VEpN (VEpN with padding)
  • I encodes to SQ (SQ== with padding)
  • AM encodes to QU0 (QU0= with padding)
  • TJM encodes to VEpN (VEpN with padding)

所以传输的数据是SQQU0VEpN.接收器 base64 将其解码为 Ix04x14xd1Q) 而不是预期的 IAMTJM.结果是无稽之谈,因为发送方已经破坏了有关编码序列中每个单词结束位置的信息.如果发送方发送了 SQ==QU0=VEpN,接收方可以将其解码为三个单独的 base64 序列,这些序列将连接起来给出 IAMTJM.

So the transmitted data is SQQU0VEpN. The receiver base64-decodes this as Ix04x14xd1Q) instead of the intended IAMTJM. The result is nonsense because the sender has destroyed information about where each word ends in the encoded sequence. If the sender had sent SQ==QU0=VEpN instead, the receiver could have decoded this as three separate base64 sequences which would concatenate to give IAMTJM.

为什么不直接设计协议,为每个单词添加一个整数长度的前缀?然后接收器可以正确解码流,不需要填充.

Why not just design the protocol to prefix each word with an integer length? Then the receiver could decode the stream correctly and there would be no need for padding.

这是个好主意,只要我们在开始编码之前知道要编码的数据的长度.但是,如果我们不是用文字,而是对来自现场摄像机的视频块进行编码呢?我们可能事先不知道每个块的长度.

That's a great idea, as long as we know the length of the data we're encoding before we start encoding it. But what if, instead of words, we were encoding chunks of video from a live camera? We might not know the length of each chunk in advance.

如果协议使用填充,则根本不需要传输长度.数据可以在从相机传入时进行编码,每个数据块都以填充结束,接收器将能够正确解码流.

If the protocol used padding, there would be no need to transmit a length at all. The data could be encoded as it came in from the camera, each chunk terminated with padding, and the receiver would be able to decode the stream correctly.

显然这是一个非常人为的例子,但也许它说明了为什么在某些情况下填充可能会有所帮助.

Obviously that's a very contrived example, but perhaps it illustrates why padding might conceivably be helpful in some situations.

这篇关于如果输入长度不能被 3 整除,为什么 base64 编码需要填充?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆