解码utf8邮件头 [英] decode utf8 mail header

查看:246
本文介绍了解码utf8邮件头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的MUA(Thunderbird 15.0.1)中,两个邮件主题都显示如下:

In my MUA (Thunderbird 15.0.1) both mail subjects are displayed like this:

Keine Mail zu "Abschlagsänderung" gefunden

以下是要复制的代码段:

Here is a snippet to reproduce it:

import email

for subject in ['Subject: Re: Keine Mail zu "=?utf-8?q?Abschlags=C3=A4nderung?=" gefunden',
                'Subject: =?utf-8?q?Keine_Mail_zu_=22Abschlags=C3=A4nderung=22_gefunden?=']:
    msg=email.message_from_string(subject)
    print email.Header.decode_header(msg.get('subject'))

输出:

[('Re: Keine Mail zu "=?utf-8?q?Abschlags=C3=A4nderung?=" gefunden', None)]
[('Keine Mail zu "Abschlags\xc3\xa4nderung" gefunden', 'utf-8')]

Python无法解析第一个标头,但是thunderbird可以.它是由KMail/1.11.4

The first header can't be parsed by python, but thunderbird does. It was created by KMail/1.11.4

如何在Python 2.7中使用变音符解析第一个标头?

How can I parse the first header with umlauts in Python 2.7?

推荐答案

根据 RFC 2047

编码字"一定不能出现在引号字符串"内.

An 'encoded-word' MUST NOT appear within a 'quoted-string'.

根据 RFC 822 的引号字符串"是

quoted-string =<> *(qtext/quoted-pair)<">;常规qtext或带引号的字符.

quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or quoted chars.

所以我认为Python库是正确的,就像

So I think the Python library is right, as

"=?utf-8?q?Abschlags=C3=A4nderung?="

是带引号的字符串.报价最少的更好替代方法是

is a quoted string. A better alternative with minimal quoting would be

=?utf-8?q?=22Abschlags=C3=A4nderung=22?=

"编码为=22.

您可以通过将"替换为=?utf-8?q?=22?=来解析它们:

You could parse them by replacing the " with =?utf-8?q?=22?=:

>>> email.Header.decode_header('=?utf-8?q?=22?= =?utf-8?q?Abschlags=C3=A4nderung?= =?utf-8?q?=22?=')
[('"Abschlags\xc3\xa4nderung"', 'utf-8')]

这篇关于解码utf8邮件头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆