从ISO-8859-5开始的Python解码 [英] Python decoding from iso-8859-5
问题描述
当我通过python email.parser.Parser解析电子邮件时,我有很多这样的字符串:
When I parse my email messages via python email.parser.Parser, I had a lot of strings like this:
=?ISO-8859-5?Q?=C0=D5=D5=E1=E2=E0_=BF=DB=D0=E2=D5=D6=D5=D9_?=
如何使用python将其解码为utf-8?
How can i decode this to utf-8 using python?
推荐答案
您的输入内容是带引号的可打印编码文本.您可以使用模块 quopri
来处理该问题:
Your input is quoted-printable encoded text. You can use the module quopri
to handle that:
import quopri
incode = '=?ISO-8859-5?Q?=C0=D5=D5=E1=E2=E0_=BF=DB=D0=E2=D5=D6=D5=D9_?='
inencoding = incode[2:12] # 'ISO-8859-5'
intext = incode[15:-2]
result = quopri.decodestring(intext).encode(inencoding)
结果将是
Реестр_Платежей
在带引号可打印的编码周围,您还具有电子邮件标题格式,指定在应用带引号可打印的解码后应解释字符串的编码字符.上面的示例代码手动"将部分字符串化,但是您也可以一步一步解决所有问题:
Around the quoted-printable encoding you additionally have an email-header formating, specifying the character encoding the string should be interpreted in after applying the quoted-printable decoding. The example code above substrings the portions "manually", but you also can solve all that in one step:
import email
[ (text, encoding) ] = email.header.decode_header(incode)
result = text.decode(encoding)
结果
现在将再次是上面给出的字符串.
result
now will again be the string given above.
这篇关于从ISO-8859-5开始的Python解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!