为什么我用utf-8编码时,此Python程序为什么发送空电子邮件? [英] Why does this Python program send empty emails when I encode it with utf-8?

查看:51
本文介绍了为什么我用utf-8编码时,此Python程序为什么发送空电子邮件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在对msg变量进行编码之前,我遇到了以下错误:

Before encoding the msg variable, I was getting this error:


UnicodeEncodeError:'ascii'编解码器无法编码字符'\ \xfc'在
位置4:序数不在范围内(128)

UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 4: ordinal not in range(128)

所以我做了一些研究,最后进行了编码变量:

So I did some research, and finally encoded the variable:

msg = (os.path.splitext(base)[0] + ': ' + text).encode('utf-8')
server.sendmail('...@gmail.com', '...@gmail.com', msg)

以下是请求的其余代码:

Here's the rest of the code on request:

def remind_me(path, time, day_freq):

for filename in glob.glob(os.path.join(path, '*.docx')):
    # file_count = sum(len(files))
    # random_file = random.randint(0, file_number-1)
    doc = docx.Document(filename)
    p_number = len(doc.paragraphs)

    text = ''
    while text == '':
        rp = random.randint(0, p_number-1) # random paragraph number
        text = doc.paragraphs[rp].text # gives the entire text in the paragraph

    base = os.path.basename(filename)
    print(os.path.splitext(base)[0] + ': ' + text)
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.starttls()
    server.login('...@gmail.com', 'password')
    msg = (os.path.splitext(base)[0] + ': ' + text).encode('utf-8')
    server.sendmail('...@gmail.com', '...@gmail.com', msg)
    server.quit()

现在,它发送空的电子邮件而不是传递消息。它返回None吗?如果是,为什么?

Now, it sends empty emails instead of delivering the message. Does it return None? If so, why?

注意:Word文档包含一些字符,如ş,ö,ğ,ç。

Note: Word documents contain some characters like ş, ö, ğ, ç.

推荐答案

smtplib.sendmail msg 参数应为<$ c包含有效RFC5322消息的$ c> bytes 序列。采用字符串并将其编码为UTF-8不太可能产生一个字符串(如果已经是ASCII,则对其进行编码没有任何用处;如果不是,则很可能做错了)。

The msg argument to smtplib.sendmail should be a bytes sequence containing a valid RFC5322 message. Taking a string and encoding it as UTF-8 is very unlikely to produce one (if it's already ASCII, encoding it does nothing useful; and if it isn't, you are most probably Doing It Wrong).

为了解释为什么这种方法不太可行,让我提供一些背景知识。在MIME消息中传输非ASCII字符串的方式取决于消息结构中字符串的上下文。这是一条简单的消息,在三个不同的上下文中嵌入了Hëlló一词,它们需要不同的编码,没有一个容易接受原始的UTF-8。

To explain why that is unlikely to work, let me provide a bit of background. The way to transport non-ASCII strings in MIME messages depends on the context of the string in the message structure. Here is a simple message with the word "Hëlló" embedded in three different contexts which require different encodings, none of which accept raw UTF-8 easily.

From: me <sender@example.org>
To: you <recipient@example.net>
Subject: =?utf-8?Q?H=C3=ABll=C3=B3?= (RFC2047 encoding)
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="fooo"

--fooo
Content-type: text/plain; charset="utf-8"
Content-transfer-encoding: quoted-printable

H=C3=ABll=C3=B3 is bare quoted-printable (RFC2045),
like what you see in the Subject header but without
the RFC2047 wrapping.

--fooo
Content-type: application/octet-stream; filename*=UTF-8''H%C3%ABll%C3%B3

This is a file whose name has been RFC2231-encoded.

--fooo--

最近的扩展允许部分内容一致的系统之间包含不完整的UTF-8的消息(甚至在标头中也是如此!),但我强烈怀疑这不是您所遇到的情况。也许还切切参见 https://en.wikipedia.org/wiki/Unicode_and_email

There are recent extensions which allow for parts of messages between conforming systems to contain bare UTF-8 (even in the headers!) but I have a strong suspicion that this is not the scenario you are in. Maybe tangentially see also https://en.wikipedia.org/wiki/Unicode_and_email

返回您的代码,我假设如果 base 恰好也是您要添加到邮件开头的标头名称,则可以正常工作,而 text 包含一个字符串以及其余的消息。您没有显示足够多的代码来明智地对此进行推理,但似乎极不可能。并且,如果 text 已经包含有效的MIME消息,则将其编码为UTF-8应该不是必需的或有用的(但显然不会,因为得到编码错误) 。

Returning to your code, I suppose it could work if base is coincidentally also the name of a header you want to add to the start of the message, and text contains a string with the rest of the message. You are not showing enough of your code to reason intelligently about this, but it seems highly unlikely. And if text already contains a valid MIME message, encoding it as UTF-8 should not be necessary or useful (but it clearly doesn't, as you get the encoding error).

让我们假设 base 包含 Subject text 的定义如下:

Let's suppose base contains Subject and text is defined thusly:

text='''=?utf-8?B?H=C3=ABll=C3=B3?= (RFC2047 encoding)
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="fooo"
....'''

现在,串联 base +':'+文本实际上会产生类似于上面的消息(尽管我重新排序了一些标头以放入 Subject:首先是这种情况),但我再次想像,这并不是代码中的实际情况。

Now, the concatenation base + ': ' + text actually produces a message similar to the one above (though I reordered some headers to put Subject: first for this scenario) but again, I imagine this is not how things actually are in your code.

如果您的目标是发送一段提取的文本作为电子邮件的正文,大致的处理方法是

If your goal is to send an extracted piece of text as the body of an email message, the way to do that is roughly

from email.mime.text import MIMEText

body_text = os.path.splitext(base)[0] + ': ' + text
sender = 'you@example.net'
recipient = 'me@example.org'

message = MIMEText(body_text)
message[subject] = 'Extracted text'
message[from] = sender
message[to] = recipient
server = smtplib.SMTP('smtp.gmail.com', 587)
# ... smtplib setup, login, authenticate?
server.send_message(message)

MIMEText()调用将构建一个电子邮件对象,该对象具有发送者,主题,收件人列表和正文的空间;其 as_text()方法返回的表示与上述 ad hoc 示例消息大致相似(尽管更简单,没有多部分结构),适用于通过SMTP传输。它透明地负责放置正确的字符集,并对非ASCII标头元素和主体部分(有效载荷)应用适当的内容传输编码。

The MIMEText() invocation builds an email object with room for a sender, a subject, a list of recipients, and a body; its as_text() method returns a representation which looks roughly similar to the ad hoc example message above (though simpler still, with no multipart structure) which is suitable for transmitting over SMTP. It transparently takes care of putting in the correct character set and applying suitable content-transfer encodings for non-ASCII header elements and body parts (payloads).

Python的标准库包含相当低级的功能,因此您必须了解一些知识才能正确连接所有组件。有一些第三方库可以隐藏其中的一些细节。但您会希望电子邮件中的任何内容至少具有主题和正文,当然也具有发件人和收件人。

Python's standard library contains fairly low-level functions so you have to know a fair bit in order to connect all the pieces correctly. There are third-party libraries which hide some of this nitty-gritty; but you would exepect anything with email to have at the very least both a subject and a body, as well as of course a sender and recipients.

这篇关于为什么我用utf-8编码时,此Python程序为什么发送空电子邮件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆