可靠的干净电子邮件正文编码 [英] Reliably Clean Email Message Body Encoding

查看:116
本文介绍了可靠的干净电子邮件正文编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用php写一小段软件,该软件连接到IMAP电子邮件框,并将其中包含的消息存储在MySQL DB中,以供以后处理和其他好处.

I am writing a small piece of software in php which connects to a IMAP email box and stores the messages contained therein in a MySQL DB for later processing and other goodness.

我注意到在测试过程中,当我尝试原始保存消息正文时,消息正文中出现了一些奇怪的字符.我正在使用imap_fetchbody()提取邮件正文.

I have noticed that during testing I get some strange characters appearing in the message body when I attempt to save the message body raw. I am using imap_fetchbody() to extract the message body.

我注意到,当我使用quoted_printable_decode()清理消息正文时,这很有帮助!但是,在进行大量研究后,我还了解到这并不总是有帮助,因此应该使用其他方法(例如utf8_encode()和base64_decode())来清理消息正文.

I noticed that when I use quoted_printable_decode() to clean up the message body this helps! However in doing lots of research I have also learned that this will not always help and that other methods such as utf8_encode() and base64_decode() should be used instead to clean up the message body.

所以,我的问题是:用php可靠地清洗电子邮件正文以覆盖所有编码方案的最佳方法是什么?

So, my question is: what is the best method for reliably cleaning an email message body with php to cover all encoding scenarios?

推荐答案

如今,电子邮件正文"实际上是一棵包含各个MIME部分的树.有时只有其中之一,例如text/plain邮件.有时,有一个multipart/alternative在其中包裹了消息的两个等效"副本,一个为text/plain,另一个为text/html.有时结构要复杂得多,具有许多嵌套层次.这些部分中的某些实际上是二进制内容,这很普遍,例如图像,附加的ZIP文件等等.

An "email body" is nowadays actually a tree of individual MIME parts. Sometimes there's just one of them, e.g. a text/plain mail. Sometimes there's a multipart/alternative which wraps inside it two "equivalent" copies of the message, one as text/plain and other as text/html. Sometimes the structure is much more complicated, with many levels of nesting. It is quite common that some of these parts are actually binary content, like images, attached ZIP files and what not.

每个单独的MIME部分都可以进行编码进行传输;这些在相应的MIME部分的Content-Transfer-Encoding标头中指定.您必须绝对支持的两种编码方案才能互操作:quoted-printablebase64.一个重要的观察是该编码对于每个部分分别发生,即具有multipart/alternativetext/plainquoted-printable编码的另一部分以及text/htmlbase64编码是完全合法的.

Each of these individual MIME parts can be encoded for transport; these are specified in the Content-Transfer-Encoding header of the corresponding MIME part. The two encoding schemes which you absolutely must support to interoperate are quoted-printable and base64. An important observation is that this encoding happens separately for each part, i.e. it's perfectly legal to have a multipart/alternative with a text/plain encoded with quoted-printable and another part, text/html encoded in base64.

解码此传输编码后,您仍然必须将文本从其字符编码解码为Unicode,即将字节流转换为Unicode文本.您需要查询Content-Type MIME头的encoding参数(同样,部分头,而不是整个消息头,除非消息本身只有一部分).

When you have decoded this transfer encoding, you still have to decode the text from its character encoding to Unicode, i.e. to turn the stream of bytes into Unicode text. You need to consult the encoding parameter of the Content-Type MIME header (again, the part header, not the whole-message header, unless the message itself has only one part).

您需要了解的所有详细信息都在RFC 2045,RFC 2046,RFC 2047和RFC 2048(及其对应的更新)中.

All details you need to know are in RFC 2045, RFC 2046, RFC 2047 and RFC 2048 (and their corresponding updates).

最后,关于电子邮件的主要部分"是什么,还有一个有趣的问题.假设您有这样的东西:

FInally, there's also the interesting question on what the "main part" of an e-mail is. Suppose you have something like this:


1 multipart/mixed
  + 1.1 text/plain: "Hi, I'm forwarding Jeff's message..."
  + 1.2 message/rfc822
    + 1.2.1 multipart/alternative
       + 1.2.1.1 text/plain "Hi coleagues, I'm sending the meeting notes from..."
       + 1.2.1.2 text/html "<p>Hi colleagues,..."

即当Fred将Jeff的消息转发给您时,就会发生这种情况.这里的主要部分"是什么?

i.e. this happens when Fred forwards Jeff's message to you. What is the "main part" here?

这篇关于可靠的干净电子邮件正文编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆