我们为什么要使用 Base64? [英] Why do we use Base64?

查看:46
本文介绍了我们为什么要使用 Base64?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

维基百科

<块引用>

当需要对二进制数据进行编码时,通常会使用 Base64 编码方案,这些数据需要通过旨在处理文本数据的媒体进行存储和传输.这是为了确保数据在传输过程中保持完好无损.

但是,数据不是总是以二进制存储/传输,因为我们机器的内存存储二进制,这取决于您如何解释它?因此,无论您将位模式 010011010110000101101110 编码为 ASCII 中的 Man 还是 Base64 中的 TWFu,您最终都会存储相同的位模式.

如果最终编码是零和一,并且每个机器和媒体都可以处理它们,那么数据是用 ASCII 还是 Base64 表示又有什么关系?

旨在处理文本数据的媒体"是什么意思?他们可以处理二进制 => 他们可以处理任何事情.

<小时>

谢谢大家,我想我现在明白了.

当我们发送数据时,我们无法确定数据会按照我们预期的格式进行解释.因此,我们发送以双方都能理解的某种格式(如 Base64)编码的数据.这样即使发送方和接收方对同一事物的解释不同,但由于他们对编码格式一致,数据也不会被错误解释.

来自马克·拜尔斯示例

如果我想发送

你好世界!

一种方法是像

这样以 ASCII 格式发送它

72 101 108 108 111 10 119 111 114 108 100 33

但是字节 10 可能不会被正确解释为另一端的换行符.所以,我们使用 ASCII 的一个子集来编码它

83 71 86 115 98 71 56 115 67 110 100 118 99 109 120 107 73 61 61

以相同数量的信息传输更多数据为代价,确保接收方可以按预期方式解码数据,即使接收方碰巧对其余字符集有不同的解释.

解决方案

您的第一个错误是认为 ASCII 编码和 Base64 编码可以互换.他们不是.它们用于不同的目的.

  • 当您用 ASCII 编码文本时,您从一个文本字符串开始并将其转换为字节序列.
  • 当您使用 Base64 对数据进行编码时,您从一个字节序列开始并将其转换为文本字符串.

要理解为什么 Base64 是必要的,我们需要了解一点计算历史.


计算机以二进制(0 和 1)进行通信,但人们通常希望使用更丰富的形式数据进行通信,例如文本或图像.为了在计算机之间传输这些数据,首先必须将其编码为 0 和 1,然后发送,然后再次解码.以文本为例 - 有许多不同的方法来执行这种编码.如果我们都能就单一编码达成一致,那会简单得多,但遗憾的是事实并非如此.

最初创建了许多不同的编码(例如 Baudot 代码),它们使用了不同的数字每个字符的位数,直到最终 ASCII 成为每个字符 7 位的标准.然而,大多数计算机以由 8 位组成的字节存储二进制数据,因此 ASCII 不适合传输这种类型数据的.有些系统甚至会擦除最重要的位.此外,跨系统行尾编码的差异意味着有时也会修改 ASCII 字符 10 和 13.

为了解决这些问题,引入了 Base64 编码.这允许您将任意字节编码为已知可以安全发送而不会损坏的字节(ASCII 字母数字字符和几个符号).缺点是使用 Base64 编码消息会增加其长度 - 每 3 个字节的数据被编码为 4 个 ASCII 字符.

要可靠地发送文本,您可以首先使用您选择的文本编码(例如 UTF-8)将其编码为字节,然后之后对生成的二进制数据进行 Base64 编码转换为可安全发送的文本字符串,编码为 ASCII.接收者将不得不逆转此过程以恢复原始消息.这当然需要接收方知道使用了哪些编码,而这些信息往往需要单独发送.

过去,它已被用于对电子邮件中的二进制数据进行编码,其中电子邮件服务器可能会修改行尾.一个更现代的例子是使用 Base64 编码来 直接在 HTML 源代码中嵌入图像数据.这里需要对数据进行编码,避免出现<"之类的字符和 '>'被解释为标签.


这是一个工作示例:

我想用两行发送一条短信:

<前>你好世界!

如果我将其作为 ASCII(或 UTF-8)发送,它将如下所示:

72 101 108 108 111 10 119 111 114 108 100 33

字节 10 在某些系统中已损坏,因此我们可以将这些字节进行 Base 64 编码为 Base64 字符串:

SGVsbG8Kd29ybGQh

使用 ASCII 编码时如下所示:

83 71 86 115 98 71 56 75 100 50 57 121 98 71 81 104

这里的所有字节都是已知的安全字节,因此任何系统破坏此消息的可能性很小.我可以发送此消息而不是我的原始消息,并让接收者反转该过程以恢复原始消息.

Wikipedia says

Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remains intact without modification during transport.

But is it not that data is always stored/transmitted in binary because the memory that our machines have store binary and it just depends how you interpret it? So, whether you encode the bit pattern 010011010110000101101110 as Man in ASCII or as TWFu in Base64, you are eventually going to store the same bit pattern.

If the ultimate encoding is in terms of zeros and ones and every machine and media can deal with them, how does it matter if the data is represented as ASCII or Base64?

What does it mean "media that are designed to deal with textual data"? They can deal with binary => they can deal with anything.


Thanks everyone, I think I understand now.

When we send over data, we cannot be sure that the data would be interpreted in the same format as we intended it to be. So, we send over data coded in some format (like Base64) that both parties understand. That way even if sender and receiver interpret same things differently, but because they agree on the coded format, the data will not get interpreted wrongly.

From Mark Byers example

If I want to send

Hello
world!

One way is to send it in ASCII like

72 101 108 108 111 10 119 111 114 108 100 33

But byte 10 might not be interpreted correctly as a newline at the other end. So, we use a subset of ASCII to encode it like this

83 71 86 115 98 71 56 115 67 110 100 118 99 109 120 107 73 61 61

which at the cost of more data transferred for the same amount of information ensures that the receiver can decode the data in the intended way, even if the receiver happens to have different interpretations for the rest of the character set.

解决方案

Your first mistake is thinking that ASCII encoding and Base64 encoding are interchangeable. They are not. They are used for different purposes.

  • When you encode text in ASCII, you start with a text string and convert it to a sequence of bytes.
  • When you encode data in Base64, you start with a sequence of bytes and convert it to a text string.

To understand why Base64 was necessary in the first place we need a little history of computing.


Computers communicate in binary - 0s and 1s - but people typically want to communicate with more rich forms data such as text or images. In order to transfer this data between computers it first has to be encoded into 0s and 1s, sent, then decoded again. To take text as an example - there are many different ways to perform this encoding. It would be much simpler if we could all agree on a single encoding, but sadly this is not the case.

Originally a lot of different encodings were created (e.g. Baudot code) which used a different number of bits per character until eventually ASCII became a standard with 7 bits per character. However most computers store binary data in bytes consisting of 8 bits each so ASCII is unsuitable for tranferring this type of data. Some systems would even wipe the most significant bit. Furthermore the difference in line ending encodings across systems mean that the ASCII character 10 and 13 were also sometimes modified.

To solve these problems Base64 encoding was introduced. This allows you to encode arbitrary bytes to bytes which are known to be safe to send without getting corrupted (ASCII alphanumeric characters and a couple of symbols). The disadvantage is that encoding the message using Base64 increases its length - every 3 bytes of data is encoded to 4 ASCII characters.

To send text reliably you can first encode to bytes using a text encoding of your choice (for example UTF-8) and then afterwards Base64 encode the resulting binary data into a text string that is safe to send encoded as ASCII. The receiver will have to reverse this process to recover the original message. This of course requires that the receiver knows which encodings were used, and this information often needs to be sent separately.

Historically it has been used to encode binary data in email messages where the email server might modify line-endings. A more modern example is the use of Base64 encoding to embed image data directly in HTML source code. Here it is necessary to encode the data to avoid characters like '<' and '>' being interpreted as tags.


Here is a working example:

I wish to send a text message with two lines:

Hello
world!

If I send it as ASCII (or UTF-8) it will look like this:

72 101 108 108 111 10 119 111 114 108 100 33

The byte 10 is corrupted in some systems so we can base 64 encode these bytes as a Base64 string:

SGVsbG8Kd29ybGQh

Which when encoded using ASCII looks like this:

83 71 86 115 98 71 56 75 100 50 57 121 98 71 81 104

All the bytes here are known safe bytes, so there is very little chance that any system will corrupt this message. I can send this instead of my original message and let the receiver reverse the process to recover the original message.

这篇关于我们为什么要使用 Base64?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆