Python解析电子邮件正文并截断MIME标头 [英] Python-Parse email Body and truncate MIME headers

查看:56
本文介绍了Python解析电子邮件正文并截断MIME标头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来有点像的电子邮件正文.

I have an email body which looks somewhat like .

现在,我想从中删除所有标题,只保留对话电子邮件文本.我该如何在python中做到这一点?

Now I want to remove all the header from it and just have the conversation email text. How can I do it in python?

我尝试了email.parser模块,但没有给我想要的结果.

I tried email.parser module but that isn't giving me the result which I want.

请找到以下代码以获取更多信息.

Please find the below code for more information.

import email
a="""--c66f5985-233d-4e89-b598-6398b60cbe00
Content-Type: multipart/alternative;
     differences="Content-Type";
    boundary="d5eff9f8-76b3-4320-adfb-1e51add8fa8f"

--d5eff9f8-76b3-4320-adfb-1e51add8fa8f
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

THis is a demo email body

Thanks And Regards,
Ana
"""



b = email.message_from_string(a)
if b.is_multipart():
    for payload in b.get_payload():
        # if payload.is_multipart(): ...
        print (payload.get_payload())
else:
    print (b.get_payload())

推荐答案

import imaplib,email

hst = "your.host.adresse.com"
usr = "login"
pwd = "password"

imap = imaplib.IMAP4(hst)

try:
    imap.login(usr, pwd)
except Exception as e:
    raise IOError(e)

try:
    imap.select("Inbox") # Tell Imap where to go
    result, data = imap.uid('search', None, "ALL")
    latest = data[0].split()[-1]
    result, data = imap.uid('fetch', latest, '(RFC822)')
    a = data[0][1] # This contains the Mail Data


except Exception as e:
    raise IOError(e)

b = email.message_from_string(a)
if b.is_multipart():
    for payload in b.get_payload():
        b = (payload.get_payload())
else:
    b = (b.get_payload())

print b

这将从最终文本中不需要的邮件中删除所有内容.我已经用您的代码对此进行了测试.您没有显示如何导入邮件(您的 a ),所以我想这就是解决解码问题的地方.

This removes all the stuff from the mail you don't want in the final text. I've tested this with your code. You didn't show how you import the mail (your a) so i guess that's where you get the decoding problem from.

如果您对HTML邮件有任何疑问:

If you have any trouble with HTML Mails:

from bs4 import BeautifulSoup
soup = BeautifulSoup(b, 'html.parser')
soup = soup.get_text()
print soup

现在应该可以完成这项工作,但是我建议您将默认的python解析器更改为lxml或html5lib.

That should do the job for now, but I'd advise you to change the default python parser to lxml or html5lib.

这篇关于Python解析电子邮件正文并截断MIME标头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆