什么是“零件"?在多部分电子邮件中? [英] What are the "parts" in a multipart email?

查看:45
本文介绍了什么是“零件"?在多部分电子邮件中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一些上下文...

前段时间,我用 Python 编写了一个处理电子邮件消息的程序,经常遇到的一件事是知道电子邮件是否是多部分"的.

经过一番研究,我知道它与包含 HTML 或附件等的电子邮件有关......但我并没有真正理解它.

我对它的使用仅限于 2 个实例:

1.当我不得不保存原始电子邮件中的附件时

我刚刚在互联网上找到了这个(可能在这里 - 抱歉没有感谢写它的人,但我似乎无法再次找到他:/)并将其粘贴到我的代码中

def downloadAttachments(emailMsg, pathToSaveFile):"""将附件保存到 pathToSaveFile(例如:pathToSaveFile = "C:\Program Files\")"""att_path_list = []对于 emailMsg.walk() 的一部分:# multipart 只是容器,所以我们跳过它们如果 part.get_content_maintype() == 'multipart':继续#这部分是附件吗?如果 part.get('Content-Disposition') 是 None:继续文件名 = part.get_filename()att_path = os.path.join(pathToSaveFile, 文件名)#检查是否已经存在如果不是 os.path.isfile(att_path) :# 最后写东西fp = 打开(att_path,'wb')fp.write(part.get_payload(decode=True))fp.close()att_path_list.append(att_path)返回 att_path_list

2.当我不得不从原始电子邮件中获取文本时

也是从互联网上的某个人那里粘贴过来的,但没有真正了解它的工作原理.

def get_text(emailMsg):"""输出:电子邮件正文(文本内容)"""如果 emailMsg.is_multipart():返回 get_text(emailMsg.get_payload(0))别的:返回 emailMsg.get_payload(None, True)

我所理解的...

如果电子邮件是多部分的,那么这些部分是可以迭代的.

我的问题是

这些部分究竟是什么?例如,你怎么知道哪个是html?或者哪个是附件?还是只是身体?

解决方案

对于如何使用多部分消息没有严格的层次结构或指导.MIME 只是定义了一种将多个有效负载收集到单个电子邮件消息中的方法.我相信最初的动机之一是能够在文本中嵌入图片;但是能够将二进制文件附加到文本消息,更一般地说,能够创建具有以任意方式相关的有效负载的结构化消息,这只是应用程序以他们认为合适的任何方式使用的东西.

一个常见的误解是假定一个层次结构为主要部分"和从属"部分.创建这种结构当然是可能的,但绝不是普遍的做法.事实上,大多数多部分消息只是有一个没有任何层次结构的部分序列.用户的电子邮件客户端通常会选择其中一个内联"部分作为首选的主要"部分来显示在消息窗格中,但这绝不是标准规定的,也不是发送方可能强制执行的.

每个 MIME 部分都有一组标头,告诉您类型、编码和处置;对于 text/* 类型的部分,默认配置是内联"(因此通常没有明确说明),而大多数其他部分的默认配置是附件".您需要参考相关标准以获得严格的定义,但可能要持保留态度,因为许多实际应用程序并不特别符合 RFC.

对于您的具体问题,找到(隐式或显式)内联的最顶层叶部分,并将支持您的用例的部分显示为主要"部分.如果您想强制将 HTML 作为首选格式,您可以这样做;但许多电子邮件应用程序将这一点推迟到用户来决定,有些用户肯定会——因为技术上的需要、身体残疾或个人品味——在可用时更喜欢纯文本.

不幸的是,最近消息生产者的普遍做法是创建一个带有 text/plaintext/html 成员的 multipart/alternative 容器,然后提供一个完全无用的 text/plain 部分,并在 text/html 部分中包含所有实际内容.在这种情况下,正确的安排是如果您不能在其中放置任何有用的内容,则根本不提供 text/plain 部分(但我想他们只关心通过一些误导的垃圾邮件过滤器,而不是关于实际适应接收者的偏好).

A bit of context...

Some time ago, I wrote Python a program that deals with email messages, one thing that always comes across is to know whether an email is "multipart" or not.

After a bit of research, I knew that it has something to do with emails containing HTML, or attachments etc... But I didn't really understand it.

My usage of it was limited to 2 instances:

1. When I had to save the attachment from the raw email

I just found this on the internet (probably on here - Sorry for not crediting the person who wrote it but I can't seem to find him again :/) and pasted it in my code

def downloadAttachments(emailMsg, pathToSaveFile):
    """
    Save Attachments to pathToSaveFile (Example: pathToSaveFile = "C:\Program Files\")
    """
    att_path_list = []
    for part in emailMsg.walk():
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue

        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()

        att_path = os.path.join(pathToSaveFile, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()
        att_path_list.append(att_path)
    return att_path_list

2. When I had to get the text from the raw email

Also pasted from someone on the internet without really understanding how it works.

def get_text(emailMsg):
    """
    Output: body of the email (text content)
    """
    if emailMsg.is_multipart():
        return get_text(emailMsg.get_payload(0))
    else:
        return emailMsg.get_payload(None, True)

What I do understand...

Is that if the email message is multipart, the parts can be iterated over.

My question is

What exactly are these parts? How do you know which one is html for example? Or which one is an attachment? Or just the body?

解决方案

There is no strict hierarchy or guidance for how exactly to use multipart messages. MIME simply defines a way to collect multiple payloads into a single email message. One of the original motivations I believe was to be able to embed pictures in text; but being able to attach binaries to a text message, and more generally, being able to create structured messages with payloads which are related in arbitrary ways is something which has simply been there for applications to use in whatever way they see fit.

A common misunderstanding is postulating a hierarchy into a "main part" and "subordinate" parts. It's certainly possible to create this structure, but it is by no means universally done. In fact, most multipart messages simply have a sequence of parts without any hierarchy. The user's email client will commonly pick one of the "inline" parts as the preferred "main" part to display in a message pane, but this is by no means dictated by the standard, or possible to enforce by the sending party.

Each MIME part has a set of headers which tell you the type, encoding, and disposition; for parts of type text/* the default disposition is "inline" (so it is often not explicitly spelled out) whereas most other parts have a default disposition of "attachment". You'll need to refer to the pertinent standards for a strict definition, but probably take it with a grain of salt, because many real-world applications are not particularly RFC-conformant.

For your concrete question, find the topmost leaf parts which are (implicitly or explicitly) inline, and display one which supports your use case as the "main" one. If you want to enforce HTML as the preferred format, you can do that; but many email applications defer this to the user to decide, and some users will definitely -- because of technical necessity, physical disabilities, or personal taste -- prefer plain-text when it's available.

Unfortunately, common practice by message producers recently has been to create a multipart/alternative container with text/plain and text/html members, but then provide a completely useless text/plain part and have all the actual content in a text/html part. The correct arrangement in this situation would be to simply not supply a text/plain part if you can't put anything useful in it (but I guess they only care about getting past some misguided spam filter, not about actually accommodating the preferences of the recipients).

这篇关于什么是“零件"?在多部分电子邮件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆