解码utf8邮件头 [英] decode utf8 mail header
问题描述
在我的MUA(Thunderbird 15.0.1)中,两个邮件主题都显示如下:
In my MUA (Thunderbird 15.0.1) both mail subjects are displayed like this:
Keine Mail zu "Abschlagsänderung" gefunden
以下是要复制的代码段:
Here is a snippet to reproduce it:
import email
for subject in ['Subject: Re: Keine Mail zu "=?utf-8?q?Abschlags=C3=A4nderung?=" gefunden',
'Subject: =?utf-8?q?Keine_Mail_zu_=22Abschlags=C3=A4nderung=22_gefunden?=']:
msg=email.message_from_string(subject)
print email.Header.decode_header(msg.get('subject'))
输出:
[('Re: Keine Mail zu "=?utf-8?q?Abschlags=C3=A4nderung?=" gefunden', None)]
[('Keine Mail zu "Abschlags\xc3\xa4nderung" gefunden', 'utf-8')]
Python无法解析第一个标头,但是thunderbird可以.它是由KMail/1.11.4
The first header can't be parsed by python, but thunderbird does. It was created by KMail/1.11.4
如何在Python 2.7中使用变音符解析第一个标头?
How can I parse the first header with umlauts in Python 2.7?
推荐答案
根据 RFC 2047 ,
编码字"一定不能出现在引号字符串"内.
An 'encoded-word' MUST NOT appear within a 'quoted-string'.
根据 RFC 822 的引号字符串"是
quoted-string =<> *(qtext/quoted-pair)<">;常规qtext或带引号的字符.
quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or quoted chars.
所以我认为Python库是正确的,就像
So I think the Python library is right, as
"=?utf-8?q?Abschlags=C3=A4nderung?="
是带引号的字符串.报价最少的更好替代方法是
is a quoted string. A better alternative with minimal quoting would be
=?utf-8?q?=22Abschlags=C3=A4nderung=22?=
将"
编码为=22
.
您可以通过将"
替换为=?utf-8?q?=22?=
来解析它们:
You could parse them by replacing the "
with =?utf-8?q?=22?=
:
>>> email.Header.decode_header('=?utf-8?q?=22?= =?utf-8?q?Abschlags=C3=A4nderung?= =?utf-8?q?=22?=')
[('"Abschlags\xc3\xa4nderung"', 'utf-8')]
这篇关于解码utf8邮件头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!