使用python强制输出UTF-8 [英] Force UTF-8 output using python

查看:564
本文介绍了使用python强制输出UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到以下错误:

UnicodeEncodeError: 'ascii' codec can't encode character '\xd7' in position 31: ordinal not in range(128)

从此代码:

test_string = """
Antelope Canyon, Arizona [1600×1068] </a>&#32; <span class="domain">(<a
"""

print(test_string)

sys.getdefaultencoding的输出:

output of sys.getdefaultencoding :

In [6]: sys.getdefaultencoding()
Out[10]: 'utf-8'

我正在将Chromebook与油煎面包块一起使用-如果这样做有所不同(我感觉可能会有所不同).

I'm using a Chromebook with crouton - if that makes a difference (I've a feeling that it might).

我不确定是否可以通过某种方式强制"这样的字符串输出,或者只是忽略任何有问题的字符.

I'm not sure if there's some way of 'forcing' the output of strings like this or just ignoring any chars that are problematic.

终端或控制台o重定向无法处理UTF-8;您要在什么环境中打印?

terminal or console o redirect cannot handle UTF-8; what environment are you trying to print in.

我正在尝试在Spacemacs中使用iPython来运行它

I'm trying to run this using iPython within Spacemacs

In [22]: sys.stdout.encoding
Out[27]: 'ANSI_X3.4-1968'

在shell中,命令语言环境输出什么?

In the shell, what does the command locale output?

在外壳中,我正在Spacemacs中的iPython中运行此命令,该命令是未定义的,在使用ctrl alt t调出的默认外壳上,输出是

In the shell I'm running this within (iPython within Spacemacs) the command is undefined, on the default shell brought up with ctrl alt t the output is

$ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

推荐答案

在POSIX主机上,Python从 locale (一组传达环境配置信息的环境变量)确定输出编码适用于各种语言设置.请参见 locale.getdefaultlocale()函数,或更具体地说, locale.getpreferredencoding() 函数.

On a POSIX host, Python determines the output encoding from the locale, a set of environment variables that communicate how the environment is configured for various language settings. See the locale.getdefaultlocale() function, or more specifically, the locale.getpreferredencoding() function.

该函数的输出用于设置sys.stdout.encoding,然后用于对任何打印的Unicode文本进行编码.

The output of that function is used to set sys.stdout.encoding, which is then used to encode any Unicode text printed.

您的语言环境设置为POSIX,这意味着默认编码为ASCII.您需要配置该语言环境以使用支持所有Unicode的编码.我不知道如何针对Chromebook做到这一点.在我的Mac上,语言环境大多设置为en_US.UTF-8,因此我的终端支持所有Unicode标准.您可以通过设置export LC_CTYPE=en_US.UTF-8来强制该问题.

Your locale is set to POSIX, which means that the default encoding is ASCII. You'll need to configure that locale to use an encoding that supports all of Unicode. How to do this for Chromebooks, I don't know. On my Mac, the locale is set to en_US.UTF-8, mostly, so all of the Unicode standard is supported by my terminal. You could force the issue by setting export LC_CTYPE=en_US.UTF-8.

您可以通过设置 PYTHONIOENCODING环境变量来覆盖Python的选择.

You can override Python's choices by setting the PYTHONIOENCODING environment variable.

请注意,在最新的Python 3版本中,sys.stdoutsys.stderr使用 backslashescape错误处理程序,它用标准的\xhh\uhhhh\Uhhhhhhhh转义序列替换了您的控制台无法处理的任何字符;因此,您会看到一个异常,而不是:

Note that on more recent Python 3 releases, sys.stdout and sys.stderr use the backslashescape error handler, which replaces any character your console can't handle with the standard \xhh, \uhhhh and \Uhhhhhhhh escape sequences; so instead of an exception you'd see:

Antelope Canyon, Arizona [1600\xd71068] </a>&#32; <span class="domain">(<a 

这篇关于使用python强制输出UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆