什么是“零件”?在多部分电子邮件中? [英] What are the "parts" in a multipart email?

查看:563
本文介绍了什么是“零件”?在多部分电子邮件中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

前段时间,我写了Python程序来处理电子邮件,经常遇到的一件事就是知道是否电子邮件是否是多部分的。

Some time ago, I wrote Python a program that deals with email messages, one thing that always comes across is to know whether an email is "multipart" or not.

经过一番研究,我知道它与包含HTML或附件等的电子邮件有关。但是我不太了解。

After a bit of research, I knew that it has something to do with emails containing HTML, or attachments etc... But I didn't really understand it.

1。当我不得不从原始电子邮件中保存附件

我刚刚在互联网上找到了这个附件(可能是在这里-抱歉,您未记下写信的人它,但我似乎再也找不到他了:/)并将其粘贴到我的代码中

I just found this on the internet (probably on here - Sorry for not crediting the person who wrote it but I can't seem to find him again :/) and pasted it in my code

def downloadAttachments(emailMsg, pathToSaveFile):
    """
    Save Attachments to pathToSaveFile (Example: pathToSaveFile = "C:\\Program Files\\")
    """
    att_path_list = []
    for part in emailMsg.walk():
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue

        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()

        att_path = os.path.join(pathToSaveFile, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()
        att_path_list.append(att_path)
    return att_path_list

2。当我不得不从原始电子邮件中获取文本

也从互联网上的某人粘贴而未真正了解其工作原理时。

Also pasted from someone on the internet without really understanding how it works.

def get_text(emailMsg):
    """
    Output: body of the email (text content)
    """
    if emailMsg.is_multipart():
        return get_text(emailMsg.get_payload(0))
    else:
        return emailMsg.get_payload(None, True)



我的理解...



如果电子邮件是多部分的,可以重复这些部分。

What I do understand...

Is that if the email message is multipart, the parts can be iterated over.

这些部分到底是什么?例如,您怎么知道html是哪个?或附件是哪一个?还是只是正文?

What exactly are these parts? How do you know which one is html for example? Or which one is an attachment? Or just the body?

推荐答案

对于如何正确使用多部分消息,没有严格的层次结构或指南。 MIME只是定义了一种将多个有效负载收集到单个电子邮件中的方法。我相信,最初的动机之一是能够将图片嵌入文字中。但是能够将二进制文件附加到文本消息上,更广泛的说,就是能够创建带有有效载荷的结构化消息,这些有效载荷以任意方式相关联,这正是应用程序以其认为合适的任何方式使用的一种方式。

There is no strict hierarchy or guidance for how exactly to use multipart messages. MIME simply defines a way to collect multiple payloads into a single email message. One of the original motivations I believe was to be able to embed pictures in text; but being able to attach binaries to a text message, and more generally, being able to create structured messages with payloads which are related in arbitrary ways is something which has simply been there for applications to use in whatever way they see fit.

一个常见的误解是将一个层次结构假定为主要部分和从属部分。当然可以创建这种结构,但这绝不是普遍的做法。实际上,大多数多部分消息只是具有一系列的部分而没有任何层次结构。用户的电子邮件客户端通常会选择一个内联部分作为首选的主要部分以显示在消息窗格中,但这绝不是标准规定的,也不是发送方可以强制执行的。

A common misunderstanding is postulating a hierarchy into a "main part" and "subordinate" parts. It's certainly possible to create this structure, but it is by no means universally done. In fact, most multipart messages simply have a sequence of parts without any hierarchy. The user's email client will commonly pick one of the "inline" parts as the preferred "main" part to display in a message pane, but this is by no means dictated by the standard, or possible to enforce by the sending party.

每个MIME部分都有一组标题,这些标题告诉您类型,编码和处理方式;对于类型为 text / * 的部分,默认配置为内联(因此通常未明确拼写出来),而大多数其他部分的默认配置为附件。您需要参考相关标准来进行严格的定义,但可能要花些时间,因为许多现实世界中的应用程序并非特别符合RFC。

Each MIME part has a set of headers which tell you the type, encoding, and disposition; for parts of type text/* the default disposition is "inline" (so it is often not explicitly spelled out) whereas most other parts have a default disposition of "attachment". You'll need to refer to the pertinent standards for a strict definition, but probably take it with a grain of salt, because many real-world applications are not particularly RFC-conformant.

对于您的具体问题,找到(隐式或显式)内联的最上面的叶子部分,并显示一个支持您的用例的部分作为主要部分。如果您想将HTML强制为首选格式,则可以执行此操作。但是许多电子邮件应用程序将其推迟给用户决定,而且由于技术上的需要,身体残障或个人品味,某些用户肯定会选择纯文本格式。

For your concrete question, find the topmost leaf parts which are (implicitly or explicitly) inline, and display one which supports your use case as the "main" one. If you want to enforce HTML as the preferred format, you can do that; but many email applications defer this to the user to decide, and some users will definitely -- because of technical necessity, physical disabilities, or personal taste -- prefer plain-text when it's available.

不幸的是,消息生产者最近的惯例是用 text / plain multipart / alternative 容器。 c>和 text / html 成员,然后提供一个完全没用的 text / plain 部分,并包含所有实际内容在 text / html 部分中。在这种情况下,正确的安排是,如果您不能在其中添加任何有用的内容,则根本不提供 text / plain 部分(但我想他们只在乎过去了)一些误导性的垃圾邮件过滤器,而不是真正适应接收者的偏好。

Unfortunately, common practice by message producers recently has been to create a multipart/alternative container with text/plain and text/html members, but then provide a completely useless text/plain part and have all the actual content in a text/html part. The correct arrangement in this situation would be to simply not supply a text/plain part if you can't put anything useful in it (but I guess they only care about getting past some misguided spam filter, not about actually accommodating the preferences of the recipients).

这篇关于什么是“零件”?在多部分电子邮件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆