python电子邮件编码和解码问题 [英] python email encoding and decoding problems

查看:255
本文介绍了python电子邮件编码和解码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我想从收件箱中读取所有新电子邮件并将其放入数据库中.我使用python的原因是因为它具有imaplib,但我对此一无所知.

Basically I want to read all new emails from an inbox and put them in a database. The reason I use python is because it has imaplib, but I know nothing about it.

目前,我有这样的东西:

Currently, I have something like this :

def primitive_get_text_blocks(email_message_instance):
    maintype = email_message_instance.get_content_maintype()
    if maintype == 'multipart':
        return_parts = ""
        for part in email_message_instance.get_payload():
            if part.get_content_maintype() == 'text':
                return_parts+= " "+ part.get_payload()
        return return_parts
    elif maintype == 'text':
        return email_message_instance.get_payload()
    return ""

fromField=con.escape(email_message["From"])
contentField=con.escape(primitive_get_text_blocks(email_message))

原始get_text_blocks是从某处复制粘贴的. 结果是我得到这样的数据库条目:

primitive get_text_blocks is copy pasted from somewhere. The result is that I get database entries like this :

<META http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8">

据我了解,这与在utf-7中进行编码有关.所以我改为get_payload(decode=True),但这给了我字节数组.如果添加另一个decode('utf-8'),有时会崩溃,并显示

From what I understand, that has something to do with being encoded in utf-7. So I changed to get_payload(decode=True), but that gives me byte-arrays. If I append another decode('utf-8'), it sometimes crashes with errors like

编解码器错误无法解码为...".

'codec error can't decode to ...'.

我不知道编码的工作方式,我只希望在电子邮件正文中包含unicode字符串.

I don't know how encodings work, I only want a unicode string with the body of my email.

为什么没有简单的convert(charset from, charset to)?如何获得可读的电子邮件正文(和地址?).我发现了 IMAP提取编码,并且使用decode_header我一无所获.

Why is there no simple convert(charset from, charset to)? How do I get a readable email body (and address?). I've discovered IMAP Fetch Encoding and using decode_header I got no further.

-

我认为编码是字节代表字符的方式,因此请记住,解码不应该采用字节数组并吐出字符串吗?在堆栈溢出的地方,我碰到有人抱怨说,这与使用utf-8utf-7编码的beeing有关.那甚至是什么意思?

I assume encoding is the way bytes represent characters, so with that in mind, shouldn't decode take a byte array and spit out a string? and here on stack overflow I came across somebody claming it had something to do with beeing encoded with utf-8 and utf-7. What does that even mean?

我做过google,似乎有很多重复项,但是他们得到的答案并没有真正帮助我(我已经尝试了其中的大多数尝试)

I did google and there appear to be tons of duplicates but the answers they got didn't really help me out (I've tried most of them)

推荐答案

结果很简单.即使所有文档都指出了unicode函数仍然是真实事物的辉煌历史,但是'str'还是一样.

Turns out it's quite easy. Even though all documentation points to the glorious past when the unicode function still was a real thing, 'str' does the same.

因此,要概括一下,您必须将'decode = True'与'getPayload'一起传递,并将其包装在str(...,'utf-8')周围.

So to recap, you have to pass 'decode=True' with 'getPayload' and wrap that around a str(...,'utf-8').

这篇关于python电子邮件编码和解码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆