Python电子邮件引用可打印编码问题 [英] Python email quoted-printable encoding problem

查看：126 发布时间：2020/10/29 0:08:13 python email encoding imaplib

本文介绍了Python电子邮件引用可打印编码问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用以下方法从Gmail中提取电子邮件：

  def getMsgs（）：
尝试：
 conn = imaplib.IMAP4_SSL（ imap.gmail.com，993）
除外：
打印'无法连接'
打印'您的互联网连接正常吗？'
 sys.exit（）
尝试：
 conn.login（用户名，密码）
除外：
 print'Failed to login'
 print'是用户名和密码正确吗？'
 sys.exit（）
 
 conn.select（'收件箱'）
＃typ，data = conn.search（无，'（（未显示的主题％ s）'％subject）
 typ，data = conn.search（无，'（SUBJECT％s）'％subject）
 for data [0] .split（）：
 typ，data = conn.fetch（num，'（RFC822）'）
 msg = email.message_from_string（data [0] [1]）$ b $ b yield walkMsg（msg）
 
 def walkMsg（msg）：
，用于msg.walk（）中的部分：
如果part.get_content_type（）！=文本/纯文本：
继续
返回part.get_payload（）

但是，我收到的一些电子邮件几乎不可能提取与编码相关的字符（例如 =）中的日期（使用正则表达式），这些字符随机落在各个文本字段的中间。这是一个在我要提取的日期范围内发生的示例：

名称：KIRSTI电子邮件：
kirsti @ blah。 blah电话号码：+ 999
99995192参加聚会的人数：4人，0位
小孩抵达/离开：10月9日=
，
2010年-2010年10月13日-10月13日2010

是否可以删除这些编码字符？

解决方案

您可以/应该使用 email.parser 模块来解码邮件，例如（快速又脏的例子！）：

<$来自email.parser的p $ p>

导入FeedParser 
f = FeedParser（）
 f.feed（<在此处插入邮件消息，包括所有标头>）
 rootMessage = f.close（）
 
＃现在您可以访问消息及其子消息（如果是多部分的）
 print rootMessage.is_multipart（）
 
＃或ch错误提示
 print rootMessage.defects 
 
＃如果是多段消息，则可以获取第一个子消息，然后获取其有效负载
＃（即内容），例如：
 rootMessage.get_payload（0）.get_payload（decode = True）

使用 Message.get_payload ，该模块会根据其编码自动对内容进行解码（例如，您问题中引用的可打印内容）。

I am extracting emails from Gmail using the following:

def getMsgs():
 try:
    conn = imaplib.IMAP4_SSL("imap.gmail.com", 993)
  except:
    print 'Failed to connect'
    print 'Is your internet connection working?'
    sys.exit()
  try:
    conn.login(username, password)
  except:
    print 'Failed to login'
    print 'Is the username and password correct?'
    sys.exit()

  conn.select('Inbox')
  # typ, data = conn.search(None, '(UNSEEN SUBJECT "%s")' % subject)
  typ, data = conn.search(None, '(SUBJECT "%s")' % subject)
  for num in data[0].split():
    typ, data = conn.fetch(num, '(RFC822)')
    msg = email.message_from_string(data[0][1])
    yield walkMsg(msg)

def walkMsg(msg):
  for part in msg.walk():
    if part.get_content_type() != "text/plain":
      continue
    return part.get_payload()

However, some emails I get are nigh impossible for me to extract dates (using regex) from as encoding-related chars such as '=', randomly land in the middle of various text fields. Here's an example where it occurs in a date range I want to extract:

Name: KIRSTI Email: kirsti@blah.blah Phone #: + 999 99995192 Total in party: 4 total, 0 children Arrival/Departure: Oct 9= , 2010 - Oct 13, 2010 - Oct 13, 2010

Is there a way to remove these encoding characters?

解决方案

You could/should use the email.parser module to decode mail messages, for example (quick and dirty example!):

from email.parser import FeedParser
f = FeedParser()
f.feed("<insert mail message here, including all headers>")
rootMessage = f.close()

# Now you can access the message and its submessages (if it's multipart)
print rootMessage.is_multipart()

# Or check for errors
print rootMessage.defects

# If it's a multipart message, you can get the first submessage and then its payload
# (i.e. content) like so:
rootMessage.get_payload(0).get_payload(decode=True)

Using the "decode" parameter of Message.get_payload, the module automatically decodes the content, depending on its encoding (e.g. quoted printables as in your question).

这篇关于Python电子邮件引用可打印编码问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python电子邮件引用可打印编码问题 [英] Python email quoted-printable encoding problem

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python电子邮件引用可打印编码问题 [英] Python email quoted-printable encoding problem

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭