电子邮件解析与python和多个接收记录的问题 [英] problem with email parsing with python and multiple Received records

查看:85
本文介绍了电子邮件解析与python和多个接收记录的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用python email.parser来解析邮件。当我的电子邮件包含多个接收记录时,email.parser似乎忽略了这些记录。



例如,输入:

  ... 
收到:从本地主机(jalapeno [127.0.0.1])$ ​​b $ b由jmason.org(Postfix)与ESMTP ID 5C4E816F6D
for< jm @ localhost> ;; Sun,2002年10月6日22:54:39 +0100(IST)
收到:从jalapeno [127.0.0.1]
由localhost与IMAP(fetchmail-5.9.0)
为jm @ localhost(single-drop); Sun,06 Oct 2002 22:54:39 +0100(IST)
...

输出是:

  ... 
收到:::从localhost(jalapeno [127.0.0.1])
by jmason.org(Postfix)with ESMTP id 5C4E816F6D
for< jm @ localhost> ;; Sun,2002年10月6日22:54:39 +0100(IST)
收到:::从localhost(jalapeno [127.0.0.1])$ ​​b $ b由jmason.org(Postfix)与ESMTP ID 5C4E816F6D
for< jm @ localhost> ;;太阳,2002年10月6日22:54:39 +0100(IST)
...

我使用以下python代码

  import email 
f = open('email.txt','r' )
data = f.read()
e = email.message_from_string(data)
在e.keys()中:
print i,':::',e [i]

这是email.parser的错误吗?



您是否建议任何其他电子邮件解析python库?

解决方案

python doc for email .__ getitem __()说: p>


请注意,如果命名字段在消息的
标题中多次出现
,那么这些字段中的哪个字段
值将被返回是未定义的。
使用get_all()方法来获取名为
的所有现有的
值。


所以,使用e.get_all(i)而不是e [i]获取Received:头的所有值。


I am trying to parse emails with python email.parser. When my email contains multiple Received records, email.parser seems like ignoring those records.

Fore example, for input :

...
Received: from localhost (jalapeno [127.0.0.1])
    by jmason.org (Postfix) with ESMTP id 5C4E816F6D
    for <jm@localhost>; Sun,  6 Oct 2002 22:54:39 +0100 (IST)
Received: from jalapeno [127.0.0.1]
    by localhost with IMAP (fetchmail-5.9.0)
    for jm@localhost (single-drop); Sun, 06 Oct 2002 22:54:39 +0100 (IST)
...

the output is :

...
Received ::: from localhost (jalapeno [127.0.0.1])
    by jmason.org (Postfix) with ESMTP id 5C4E816F6D
    for <jm@localhost>; Sun,  6 Oct 2002 22:54:39 +0100 (IST)
Received ::: from localhost (jalapeno [127.0.0.1])
    by jmason.org (Postfix) with ESMTP id 5C4E816F6D
    for <jm@localhost>; Sun,  6 Oct 2002 22:54:39 +0100 (IST)
...

I am using the following python code

import email
f = open('email.txt', 'r')
data = f.read()
e = email.message_from_string(data)
for i in e.keys():
    print i, ':::', e[i]

Is this a bug of email.parser?

Do you suggest any other email parsing python library?

解决方案

The python doc for email.__getitem__() says:

Note that if the named field appears more than once in the message’s headers, exactly which of those field values will be returned is undefined. Use the get_all() method to get the values of all the extant named headers.

so, use e.get_all(i) instead of e[i] to get all values of the Received: header.

这篇关于电子邮件解析与python和多个接收记录的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆