Python解析电子邮件正文并截断MIME标头 [英] Python-Parse email Body and truncate MIME headers
问题描述
我有一个看起来有点像的电子邮件正文.
I have an email body which looks somewhat like .
现在,我想从中删除所有标题,只保留对话电子邮件文本.我该如何在python中做到这一点?
Now I want to remove all the header from it and just have the conversation email text. How can I do it in python?
我尝试了email.parser模块,但没有给我想要的结果.
I tried email.parser module but that isn't giving me the result which I want.
请找到以下代码以获取更多信息.
Please find the below code for more information.
import email
a="""--c66f5985-233d-4e89-b598-6398b60cbe00
Content-Type: multipart/alternative;
differences="Content-Type";
boundary="d5eff9f8-76b3-4320-adfb-1e51add8fa8f"
--d5eff9f8-76b3-4320-adfb-1e51add8fa8f
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
THis is a demo email body
Thanks And Regards,
Ana
"""
b = email.message_from_string(a)
if b.is_multipart():
for payload in b.get_payload():
# if payload.is_multipart(): ...
print (payload.get_payload())
else:
print (b.get_payload())
推荐答案
import imaplib,email
hst = "your.host.adresse.com"
usr = "login"
pwd = "password"
imap = imaplib.IMAP4(hst)
try:
imap.login(usr, pwd)
except Exception as e:
raise IOError(e)
try:
imap.select("Inbox") # Tell Imap where to go
result, data = imap.uid('search', None, "ALL")
latest = data[0].split()[-1]
result, data = imap.uid('fetch', latest, '(RFC822)')
a = data[0][1] # This contains the Mail Data
except Exception as e:
raise IOError(e)
b = email.message_from_string(a)
if b.is_multipart():
for payload in b.get_payload():
b = (payload.get_payload())
else:
b = (b.get_payload())
print b
这将从最终文本中不需要的邮件中删除所有内容.我已经用您的代码对此进行了测试.您没有显示如何导入邮件(您的 a
),所以我想这就是解决解码问题的地方.
This removes all the stuff from the mail you don't want in the final text. I've tested this with your code. You didn't show how you import the mail (your a
) so i guess that's where you get the decoding problem from.
如果您对HTML邮件有任何疑问:
If you have any trouble with HTML Mails:
from bs4 import BeautifulSoup
soup = BeautifulSoup(b, 'html.parser')
soup = soup.get_text()
print soup
现在应该可以完成这项工作,但是我建议您将默认的python解析器更改为lxml或html5lib.
That should do the job for now, but I'd advise you to change the default python parser to lxml or html5lib.
这篇关于Python解析电子邮件正文并截断MIME标头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!