内容传输编码7位或8位 [英] Content Transfer Encoding 7bit or 8 bit

查看:129
本文介绍了内容传输编码7位或8位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

发送电子邮件内容时,需要设置内容传输编码标题。我观察到我收到的许多电子邮件标题。一些使用7bit的电子邮件,有些使用8bit。

While sending email content, it is required to set "Content Transfer Encoding" header. I observed many headers of emails that I received. Some emails using "7bit" and some are using "8bit".

这两个有什么区别?哪个是推荐的?为了设置这些标头,电子邮件正文是否需要特殊编码?

What is the difference between these two? Which is recommended? Is there any special encoding required for email body in order to set these headers?

推荐答案

可能有点密集阅读,但RFC 1341的Content-Transfer-Encoding部分具有所有详细信息:

It can be a bit dense to read, but the "Content-Transfer-Encoding" section of RFC 1341 has all of the details:

http://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html

情况有差距,从坏到坏。以下是我的摘要:

The situation kinda goes from bad to worse. Here's my summary:

根据定义(RFC 821),SMTP将邮件限制为1000行每个7位的字符。这意味着,从管道发送的字节不能将最高有效(最高阶)位设置为1。

SMTP, by definition (RFC 821), limits mail to lines of 1000 characters of 7 bits each. That means that none of the bytes you send down the pipe can have the most significant ("highest-order") bit set to "1".

我们想要发送的内容通常不会固有地遵守这一限制。想想一个图像文件或包含Unicode字符的文本文件:这些文件的字节通常将其第8位设置为1。 SMTP不允许这样做,所以您需要使用传输编码来描述您如何处理不匹配问题。

The content that we want to send will often not obey this restriction inherently. Think of an image file, or a text file that contains Unicode characters: the bytes of these files will often have their 8th bit set to "1". SMTP doesn't allow this, so you need to use "transfer encoding" to describe how you've worked around the mismatch.

Content-Transfer-Encoding 标题描述了您选择解决此问题的规则。

The values for the Content-Transfer-Encoding header describe the rule that you've chosen to solve this problem.

7bit 只是意味着我的数据只包含US-ASCII字符,每个字符只能使用低7位。您基本上保证内容中的所有字节都符合SMTP的限制,因此无需特殊处理。您可以按原样阅读。

7bit simply means "My data consists only of US-ASCII characters, which only use the lower 7 bits for each character." You're basically guaranteeing that all of the bytes in your content already adhere to the restrictions of SMTP, and so it needs no special treatment. You can just read it as-is.

请注意,当您选择 7bit 时,您同意所有您的内容中的行数不得超过1000个字符。

Note that when you choose 7bit, you're agreeing that all of the lines in your content are less than 1000 characters in length.

只要您的内容符合这些规则, 7bit 是最好的传输编码,因为不需要额外的工作;你只是读/写字节,因为它们脱离管道。这也是很容易的眼球 7bit 内容,并了解它。这里的想法是,如果你只是写纯英文文本,你会很好。但是,在2005年不正确今天不是这样。

As long as your content adheres to these rule, 7bit is the best transfer encoding, since there's no extra work necessary; you just read/write the bytes as they come off the pipe. It's also easy to eyeball 7bit content and make sense of it. The idea here is that if you're just writing in "plain English text" you'll be fine. But that wasn't true in 2005 and it isn't true today.

8bit 表示我的数据可能包括扩展ASCII字符;他们可以使用第8(最高)位来表示标准US-ASCII 7位字符之外的特殊字符。与 7bit 一样,仍然有1000个字符的行限制。

8bit means "My data may include extended ASCII characters; they may use the 8th (highest) bit to indicate special characters outside of the standard US-ASCII 7-bit characters." As with 7bit, there's still a 1000-character line limit.

8bit ,就像 7bit 对字节进行任何转换,因为它们被写入或从线中读取。这只是意味着你不保证没有一个字节的最高位设置为1。

8bit, just like 7bit, does not actually do any transformation of the bytes as they're written to or read from the wire. It just means that you're not guaranteeing that none of the bytes will have the highest bit set to "1".

这似乎是从 7bit ,因为它可以让您更加自由的内容。然而,RFC 1341包含这个tidbit:

This seems like a step up from 7bit, since it gives you more freedom in your content. However, RFC 1341 contains this tidbit:


在本文档发布之前,没有标准的互联网传输是合法的在邮件正文中包含未编码的8位或二进制数据。因此,8bit或二进制内容转移编码在互联网上实际上是合法的。

As of the publication of this document, there are no standardized Internet transports for which it is legitimate to include unencoded 8-bit or binary data in mail bodies. Thus there are no circumstances in which the "8bit" or "binary" Content-Transfer-Encoding is actually legal on the Internet.

RFC 1341在20多年前出现。此后,我们已经在 8位MIME扩展程序 tools.ietf.org/html/rfc6152 =noreferrer> RFC 6152 。但是即使如此,线路限制仍然可能适用:

RFC 1341 came out over 20 years ago. Since then we've gotten 8bit MIME Extensions in RFC 6152. But even then, line limits still may apply:


请注意,此扩展不会消除SMTP服务器限制线路长度的可能性;服务器可以自由实现此扩展,但仍设置不低于1000个八位字节的行长度限制。

Note that this extension does NOT eliminate the possibility of an SMTP server limiting line length; servers are free to implement this extension but nevertheless set a line length limit no lower than 1000 octets.



二进制编码



二进制 8bit 相同,只是没有行长限制。您仍然可以包含所需的任何字符,并且没有额外的编码。 RFC 1341类似于 8bit ,指出它并不是一个合法的编码传输编码。 RFC 3030 使用 BINARYMIME 进行扩展。

Binary Encoding

binary is the same as 8bit, except that there's no line length restriction. You can still include any characters you want, and there's no extra encoding. Similar to 8bit, RFC 1341 states that it's not really a legitimate encoding transfer encoding. RFC 3030 extended this with BINARYMIME.

8BITMIME 扩展名之前,需要一种通过SMTP发送不能 7bit 的内容的方法。 HTML文件(可能有超过1000个字符的行)和具有国际字符的文件是很好的例子。 quoted-printable encoding(RFC 1341第5.1节定义)旨在处理此问题。它有两件事情:

Before the 8BITMIME extension, there needed to be a way to send content that couldn't be 7bit over SMTP. HTML files (which might have more than 1000-character lines) and files with international characters are good examples of this. The quoted-printable encoding (Defined in Section 5.1 of RFC 1341) is designed to handle this. It does two things:


  • 定义如何转义非US-ASCII字符,以便只能以7位字符表示。 (短版本:它们显示为等号加上两个7位字符。)

  • 定义该行不会超过76个字符,并且该换行符将使用特殊的字符(然后被转义)

引用可打印由于转义和短行,更难于阅读人比 7bit 8bit ,但它确实支持更广泛的可能内容。

Quoted Printable, because of the escaping and short lines, is much harder to read by a human than 7bit or 8bit, but it does support a much wider range of possible content.

如果您的数据主要是非文本(例如:图像文件),则没有很多选项。 7bit 在桌子上。在MIME扩展RFC之前,不支持 8bit binary quoted-printable 将工作,但实际上效率不高(每个字节将由3个字符表示)。

If your data is largely non-text (ex: an image file), you don't have many options. 7bit is off the table. 8bit and binary were unsupported prior to the MIME extension RFCs. quoted-printable would work, but is really inefficient (every byte is going to be represented by 3 characters).

base64 是这种类型的数据的一个很好的解决方案。它将3个原始字节编码为4个US-ASCII字符,这是相对有效的。 RFC 1341进一步将 base64 编码数据的行长度限制为76个字符以适合SMTP邮件,但是当您只是拆分或连接任意时字符固定长度。

base64 is a good solution for this type of data. It encodes 3 raw bytes as 4 US-ASCII characters, which is relatively efficient. RFC 1341 further limits the line length of base64-encoded data to 76 characters to fit within an SMTP message, but that's relatively easy to manage when you're just splitting or concatenating arbitrary characters at fixed lengths.

最大的缺点是,人们几乎完全无法读取 base64 即使它只是下面的简单文本。

The big downside is that base64-encoded data is pretty much entirely unreadable by humans, even if it's just "plain" text underneath.

这篇关于内容传输编码7位或8位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆