Python电子邮件引用的可打印编码问题 [英] Python email quoted-printable encoding problem

查看:22
本文介绍了Python电子邮件引用的可打印编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下方法从 Gmail 中提取电子邮件:

I am extracting emails from Gmail using the following:

def getMsgs():
 try:
    conn = imaplib.IMAP4_SSL("imap.gmail.com", 993)
  except:
    print 'Failed to connect'
    print 'Is your internet connection working?'
    sys.exit()
  try:
    conn.login(username, password)
  except:
    print 'Failed to login'
    print 'Is the username and password correct?'
    sys.exit()

  conn.select('Inbox')
  # typ, data = conn.search(None, '(UNSEEN SUBJECT "%s")' % subject)
  typ, data = conn.search(None, '(SUBJECT "%s")' % subject)
  for num in data[0].split():
    typ, data = conn.fetch(num, '(RFC822)')
    msg = email.message_from_string(data[0][1])
    yield walkMsg(msg)

def walkMsg(msg):
  for part in msg.walk():
    if part.get_content_type() != "text/plain":
      continue
    return part.get_payload()

但是,我收到的一些电子邮件几乎不可能提取日期(使用正则表达式)作为与编码相关的字符,例如=",随机位于各种文本字段的中间.这是一个示例,它出现在我要提取的日期范围内:

However, some emails I get are nigh impossible for me to extract dates (using regex) from as encoding-related chars such as '=', randomly land in the middle of various text fields. Here's an example where it occurs in a date range I want to extract:

姓名:KIRSTI 电子邮件:kirsti@blah.blah 电话号码:+ 99999995192 聚会总人数:4人,0人儿童 抵达/离开:10 月 9 日=,2010 - 2010 年 10 月 13 日 - 2010 年 10 月 13 日

Name: KIRSTI Email: kirsti@blah.blah Phone #: + 999 99995192 Total in party: 4 total, 0 children Arrival/Departure: Oct 9= , 2010 - Oct 13, 2010 - Oct 13, 2010

有没有办法去除这些编码字符?

Is there a way to remove these encoding characters?

推荐答案

您可以/应该使用 email.parser 模块,用于解码邮件消息,例如(快速和肮脏的例子!):

You could/should use the email.parser module to decode mail messages, for example (quick and dirty example!):

from email.parser import FeedParser
f = FeedParser()
f.feed("<insert mail message here, including all headers>")
rootMessage = f.close()

# Now you can access the message and its submessages (if it's multipart)
print rootMessage.is_multipart()

# Or check for errors
print rootMessage.defects

# If it's a multipart message, you can get the first submessage and then its payload
# (i.e. content) like so:
rootMessage.get_payload(0).get_payload(decode=True)

使用的decode"参数Message.get_payload,该模块会根据其编码(例如您的问题中引用的可打印文件)自动解码内容.

Using the "decode" parameter of Message.get_payload, the module automatically decodes the content, depending on its encoding (e.g. quoted printables as in your question).

这篇关于Python电子邮件引用的可打印编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆