内容传输编码 7 位或 8 位 [英] Content Transfer Encoding 7bit or 8 bit

查看:31
本文介绍了内容传输编码 7 位或 8 位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在发送电子邮件内容时,需要设置内容传输编码"标题.我观察到我收到的许多电子邮件的标题.一些电子邮件使用7bit",一些电子邮件使用8bit".

While sending email content, it is required to set "Content Transfer Encoding" header. I observed many headers of emails that I received. Some emails using "7bit" and some are using "8bit".

这两者有什么区别?推荐哪个?为了设置这些标题,电子邮件正文是否需要任何特殊编码?

What is the difference between these two? Which is recommended? Is there any special encoding required for email body in order to set these headers?

推荐答案

读起来可能有点密集,但是Content-Transfer-Encoding"RFC 1341 的部分包含所有详细信息:

It can be a bit dense to read, but the "Content-Transfer-Encoding" section of RFC 1341 has all of the details:

http://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html

情况越来越糟.这是我的总结:

The situation kinda goes from bad to worse. Here's my summary:

根据定义 (RFC 821),SMTP 将邮件限制为 1000 个字符的行,每行 7 位.这意味着您通过管道发送的任何字节都不能将最高有效(最高阶")位设置为1".

SMTP, by definition (RFC 821), limits mail to lines of 1000 characters of 7 bits each. That means that none of the bytes you send down the pipe can have the most significant ("highest-order") bit set to "1".

我们要发送的内容通常不会固有地遵守此限制.想想一个图像文件,或一个包含 Unicode 字符的文本文件:这些文件的字节通常将它们的第 8 位设置为1".SMTP 不允许这样做,因此您需要使用传输编码";描述您如何解决不匹配的问题.

The content that we want to send will often not obey this restriction inherently. Think of an image file, or a text file that contains Unicode characters: the bytes of these files will often have their 8th bit set to "1". SMTP doesn't allow this, so you need to use "transfer encoding" to describe how you've worked around the mismatch.

Content-Transfer-Encoding 标头的值描述了您为解决此问题而选择的规则.

The values for the Content-Transfer-Encoding header describe the rule that you've chosen to solve this problem.

7bit 仅表示我的数据仅由 US-ASCII 字符组成,每个字符仅使用低 7 位".您基本上可以保证内容中的所有字节都已遵守 SMTP 的限制,因此不需要特殊处理.您可以按原样阅读.

7bit simply means "My data consists only of US-ASCII characters, which only use the lower 7 bits for each character." You're basically guaranteeing that all of the bytes in your content already adhere to the restrictions of SMTP, and so it needs no special treatment. You can just read it as-is.

请注意,当您选择 7bit 时,即表示您同意内容中的所有行的长度都小于 1000 个字符.

Note that when you choose 7bit, you're agreeing that all of the lines in your content are less than 1000 characters in length.

只要您的内容遵守这些规则,7bit 是最好的传输编码,因为不需要额外的工作;您只需在字节离开管道时读/写字节.观察 7bit 内容并理解它也很容易.这里的想法是,如果你只是用纯英文文本"写作你会没事的.但 在 2005 年并非如此今天不是这样.

As long as your content adheres to these rule, 7bit is the best transfer encoding, since there's no extra work necessary; you just read/write the bytes as they come off the pipe. It's also easy to eyeball 7bit content and make sense of it. The idea here is that if you're just writing in "plain English text" you'll be fine. But that wasn't true in 2005 and it isn't true today.

8bit 表示我的数据可能包含扩展的 ASCII 字符;他们可以使用第 8(最高)位来表示标准 US-ASCII 7 位字符之外的特殊字符."与 7bit 一样,仍然有 1000 个字符的行限制.

8bit means "My data may include extended ASCII characters; they may use the 8th (highest) bit to indicate special characters outside of the standard US-ASCII 7-bit characters." As with 7bit, there's still a 1000-character line limit.

8bit7bit 一样,实际上不会在字节写入或读取时进行任何转换.这只是意味着您不能保证没有任何字节的最高位设置为1".

8bit, just like 7bit, does not actually do any transformation of the bytes as they're written to or read from the wire. It just means that you're not guaranteeing that none of the bytes will have the highest bit set to "1".

这似乎是 7bit 的升级版,因为它为您的内容提供了更大的自由度.但是,RFC 1341 包含此花絮:

This seems like a step up from 7bit, since it gives you more freedom in your content. However, RFC 1341 contains this tidbit:

截至本文档发布时,还没有标准化的 Internet 传输允许在邮件正文中包含未编码的 8 位或二进制数据.因此,不存在8bit"的情况.或二进制"内容传输编码在互联网上实际上是合法的.

As of the publication of this document, there are no standardized Internet transports for which it is legitimate to include unencoded 8-bit or binary data in mail bodies. Thus there are no circumstances in which the "8bit" or "binary" Content-Transfer-Encoding is actually legal on the Internet.

RFC 1341 于 20 多年前问世.从那时起,我们在 RFC 6152.但即便如此,行数限制仍可能适用:

RFC 1341 came out over 20 years ago. Since then we've gotten 8bit MIME Extensions in RFC 6152. But even then, line limits still may apply:

请注意,此扩展并不能消除 SMTP 服务器限制行长度的可能性;服务器可以自由实现此扩展,但仍将行长度限制设置为不低于 1000 个八位字节.

Note that this extension does NOT eliminate the possibility of an SMTP server limiting line length; servers are free to implement this extension but nevertheless set a line length limit no lower than 1000 octets.

二进制编码

binary8bit 相同,只是没有行长限制.您仍然可以包含您想要的任何字符,并且没有额外的编码.与 8bit 类似,RFC 1341 声明它不是真正合法的编码传输编码.RFC 3030 使用 BINARYMIME 对此进行了扩展.

Binary Encoding

binary is the same as 8bit, except that there's no line length restriction. You can still include any characters you want, and there's no extra encoding. Similar to 8bit, RFC 1341 states that it's not really a legitimate encoding transfer encoding. RFC 3030 extended this with BINARYMIME.

8BITMIME 扩展之前,需要有一种方法可以通过 SMTP 发送不能是 7bit 的内容.HTML 文件(可能有超过 1000 个字符的行)和带有国际字符的文件就是很好的例子.quoted-printable 编码(在 RFC 1341 的第 5.1 节中定义)旨在处理此问题.它做两件事:

Before the 8BITMIME extension, there needed to be a way to send content that couldn't be 7bit over SMTP. HTML files (which might have more than 1000-character lines) and files with international characters are good examples of this. The quoted-printable encoding (Defined in Section 5.1 of RFC 1341) is designed to handle this. It does two things:

  • 定义如何转义非 US-ASCII 字符,以便它们只能用 7 位字符表示.(简短版本:它们显示为一个等号加两个 7 位字符.)
  • 定义行不超过 76 个字符,换行符将使用特殊字符(然后转义)表示.

Quoted Printable,由于转义和短行,比 7bit8bit 更难被人类阅读,但它确实支持更广泛的可能的内容.

Quoted Printable, because of the escaping and short lines, is much harder to read by a human than 7bit or 8bit, but it does support a much wider range of possible content.

如果您的数据主要是非文本的(例如:图像文件),则您没有太多选择.7bit 不在桌面上.8bitbinary 在 MIME 扩展 RFC 之前不受支持.quoted-printable 可以工作,但效率很低(每个字节将由 3 个字符表示).

If your data is largely non-text (ex: an image file), you don't have many options. 7bit is off the table. 8bit and binary were unsupported prior to the MIME extension RFCs. quoted-printable would work, but is really inefficient (every byte is going to be represented by 3 characters).

base64 是这种类型数据的一个很好的解决方案.它将 3 个原始字节编码为 4 个 US-ASCII 字符,这是相对高效的.RFC 1341 进一步将 base64 编码数据的行长度限制为 76 个字符以适应 SMTP 消息,但当您只是以固定长度拆分或连接任意字符时,这相对容易管理.

base64 is a good solution for this type of data. It encodes 3 raw bytes as 4 US-ASCII characters, which is relatively efficient. RFC 1341 further limits the line length of base64-encoded data to 76 characters to fit within an SMTP message, but that's relatively easy to manage when you're just splitting or concatenating arbitrary characters at fixed lengths.

最大的缺点是 base64 编码的数据几乎完全无法被人类读取,即使它只是普通"数据.下面的文字.

The big downside is that base64-encoded data is pretty much entirely unreadable by humans, even if it's just "plain" text underneath.

这篇关于内容传输编码 7 位或 8 位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆