python电子邮件编码和解码问题 [英] python email encoding and decoding problems

查看：255 发布时间：2020/7/13 3:21:41 python email encoding utf-8 character-encoding

本文介绍了python电子邮件编码和解码问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

基本上，我想从收件箱中读取所有新电子邮件并将其放入数据库中.我使用python的原因是因为它具有imaplib，但我对此一无所知.

Basically I want to read all new emails from an inbox and put them in a database. The reason I use python is because it has imaplib, but I know nothing about it.

目前，我有这样的东西:

Currently, I have something like this :

def primitive_get_text_blocks(email_message_instance):
    maintype = email_message_instance.get_content_maintype()
    if maintype == 'multipart':
        return_parts = ""
        for part in email_message_instance.get_payload():
            if part.get_content_maintype() == 'text':
                return_parts+= " "+ part.get_payload()
        return return_parts
    elif maintype == 'text':
        return email_message_instance.get_payload()
    return ""

fromField=con.escape(email_message["From"])
contentField=con.escape(primitive_get_text_blocks(email_message))

原始get_text_blocks是从某处复制粘贴的. 结果是我得到这样的数据库条目:

primitive get_text_blocks is copy pasted from somewhere. The result is that I get database entries like this :

<META http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8">

据我了解，这与在utf-7中进行编码有关.所以我改为get_payload(decode=True)，但这给了我字节数组.如果添加另一个decode('utf-8')，有时会崩溃，并显示

From what I understand, that has something to do with being encoded in utf-7. So I changed to get_payload(decode=True), but that gives me byte-arrays. If I append another decode('utf-8'), it sometimes crashes with errors like

编解码器错误无法解码为...".

'codec error can't decode to ...'.

我不知道编码的工作方式，我只希望在电子邮件正文中包含unicode字符串.

I don't know how encodings work, I only want a unicode string with the body of my email.

为什么没有简单的convert(charset from, charset to)?如何获得可读的电子邮件正文(和地址?).我发现了 IMAP提取编码，并且使用decode_header我一无所获.

Why is there no simple convert(charset from, charset to)? How do I get a readable email body (and address?). I've discovered IMAP Fetch Encoding and using decode_header I got no further.

我认为编码是字节代表字符的方式，因此请记住，解码不应该采用字节数组并吐出字符串吗?在堆栈溢出的地方，我碰到有人抱怨说，这与使用utf-8和utf-7编码的beeing有关.那甚至是什么意思?

I assume encoding is the way bytes represent characters, so with that in mind, shouldn't decode take a byte array and spit out a string? and here on stack overflow I came across somebody claming it had something to do with beeing encoded with utf-8 and utf-7. What does that even mean?

我做过google，似乎有很多重复项，但是他们得到的答案并没有真正帮助我(我已经尝试了其中的大多数尝试)

I did google and there appear to be tons of duplicates but the answers they got didn't really help me out (I've tried most of them)

python电子邮件编码和解码问题 [英] python email encoding and decoding problems

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python电子邮件编码和解码问题 [英] python email encoding and decoding problems

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭