python email.message_from_string()解析问题 [英] python email.message_from_string() parse problems
问题描述
我的设置使用fetchmail从Gmail中提取邮件,这些邮件由procmail处理并传递给python脚本。
My setup uses fetchmail to pull emails from Gmail, which are processed by procmail and passes it to a python script.
当我使用 email.message_from_string()
,生成的对象不会被解析为电子邮件对象。 get_payload()
将邮件的标题/正文/有效载荷文本作为单个文本blob返回。
When I use email.message_from_string()
, the resulting object is not parsed as an email object. get_payload()
returns the header/body/payload text of the email as a single text blob.
它返回的文本:
From example@gmail.com Sat Aug 17 19:20:44 2013
>From example Sat Aug 17 19:20:44 2013
MIME-Version: 1.0
Received: from ie-in-f109.1e100.net [74.125.142.109]
by VirtualBox with IMAP (fetchmail-6.3.21)
for <example@localhost> (single-drop); Sat, 17 Aug 2013 19:20:44 -0700 (PDT)
Received: by 10.70.131.110 with HTTP; Sat, 17 Aug 2013 19:20:42 -0700 (PDT)
Date: Sat, 17 Aug 2013 19:20:42 -0700
Delivered-To: example@gmail.com
Message-ID: <CAAsp4m0GBeVg80-ryFgNvNNAj_QPguzbX3DqvMSx-xSGZM18Pw@mail.gmail.com>
Subject: test 19:20
From: example <example@gmail.com>
To: example <example@gmail.com>
Content-Type: multipart/alternative; boundary=001a1133435474449004e42f7861
--001a1133435474449004e42f7861
Content-Type: text/plain; charset=ISO-8859-1
19:20
--001a1133435474449004e42f7861
Content-Type: text/html; charset=ISO-8859-1
<div dir="ltr">19:20</div>
--001a1133435474449004e42f7861--
我的代码:
full_msg = sys.stdin.read()
msg = email.message_from_string(full_msg)
msg['to'] # returns None
msg.get_payload() # returns the text above
我缺少什么让Python正确解释电子邮件?
What am I missing to get Python to properly interpret the email?
我从这些 问题我可能没有得到正确的电子邮件标题在某个地方,但我无法确认。第2行的>不是一个打字错误:它在文本中。
I see from these questions that I may not be getting the proper email headers somewhere along the line, but I cannot confirm. That ">" on line 2 is not a typo: it's in the text.
推荐答案
无论正如你所说,无论如何 - 这是错误的。删除此字符后:
Regardless of ">" being "in the text" as you say, whatever that means - it's wrong. After removing this character:
> python test.py< input.txt
example< example@gmail.com>
[< email。 message.Message实例在0x02810288>,
< email.message.Message实例在0x02810058>]
所以错误不是在解析邮件,而是在>字符中以某种方式破坏您的电子邮件文本。
So the error is not in parsing the message, but in the ">" character somehow corrupting your email text.
这篇关于python email.message_from_string()解析问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!