Python3 UTF-8解码问题 [英] Python3 utf-8 decode issue

查看:305
本文介绍了Python3 UTF-8解码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码在Windows计算机上可以在Python3上正常运行,并输出字符'é':

The following code runs fine with Python3 on my Windows machine and prints the character 'é':

data = b"\xc3\xa9"

print(data.decode('utf-8'))

但是,在基于Ubuntu的Docker容器上运行相同的结果是:

However, running the same on an Ubuntu based docker container results in :

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128)

要启用utf-8解码,是否需要安装任何东西?

Is there anything that I have to install to enable utf-8 decoding ?

推荐答案

问题出在print()表达式,而不是decode()方法. 如果仔细观察,引发的异常是Unicode En codeError,而不是- De codeError.

The problem is with the print() expression, not with the decode() method. If you look closely, the raised exception is a UnicodeEncodeError, not a -DecodeError.

无论何时使用print()函数,Python都会将其参数转换为str,随后将结果编码为bytes,然后将其发送到终端(或运行Python的任何终端). 用于编码的编解码器(例如UTF-8或ASCII)取决于环境. 在理想情况下,

Whenever you use the print() function, Python converts its arguments to a str and subsequently encodes the result to bytes, which are sent to the terminal (or whatever Python is run in). The codec which is used for encoding (eg. UTF-8 or ASCII) depends on the environment. In an ideal case,

  • Python使用的编解码器与终端所期望的编解码器兼容,因此字符可以正确显示(否则,您会像é"而不是é"一样获得mojibake);
  • 所使用的编解码器涵盖了足以满足您需求的一系列字符(例如UTF-8或UTF-16,其中包含所有字符).

在您的情况下,您提到的Linux泊坞窗不满足第二个条件:所使用的编码为ASCII,仅支持在老式英文打字机上找到的字符. 这些是解决此问题的一些方法:

In your case, the second condition isn't met for the Linux docker you mention: the encoding used is ASCII, which only supports characters found on an old English typewriter. These are a few options to address this problem:

  • 设置环境变量:在Linux上,Python的默认编码取决于此(至少部分地).以我的经验,这是一个反复试验的过程.将LC_ALL设置为包含"UTF-8"的内容对我来说一次.您必须将它们放入启动脚本中,以便您的终端运行的外壳程序,例如. .bashrc .
  • 重新编码STDOUT,如下所示:

  • Set environment variables: on Linux, Python's encoding defaults depend on this (at least partially). In my experience, this is a bit of a trial and error; setting LC_ALL to something containing "UTF-8" worked for me once. You'll have to put them in start-up script for the shell your terminal runs, eg. .bashrc.
  • Re-encode STDOUT, like so:

sys.stdout = open(sys.stdout.buffer.fileno(), 'w', encoding='utf8')

使用的编码必须匹配终端之一.

The encoding used has to match the one of the terminal.

也许还有其他选择,但我怀疑还有更好的选择.

There might be other options, but I doubt that there are nicer ones.

这篇关于Python3 UTF-8解码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆