如何在 Python 中将 UTF-8 编码的文本打印到控制台 <3? [英] How to print UTF-8 encoded text to the console in Python < 3?

查看:41
本文介绍了如何在 Python 中将 UTF-8 编码的文本打印到控制台 <3?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个最新的 Linux 系统,其中我的所有语言环境都是 UTF-8:

LANG=de_DE.UTF-8语言=LC_CTYPE="de_DE.UTF-8"LC_NUMERIC="de_DE.UTF-8"LC_TIME="de_DE.UTF-8"...LC_IDENTIFICATION="de_DE.UTF-8"LC_ALL=

现在我想将 UTF-8 编码的内容写入控制台.

现在 Python 使用 UTF-8 作为 FS 编码,但坚持使用 ASCII 作为默认编码 :-(

<预><代码>>>>导入系统>>>sys.getdefaultencoding()'ascii'>>>sys.getfilesystemencoding()'UTF-8'

我认为最好(干净)的方法是设置 PYTHONIOENCODING 环境变量.但似乎 Python 忽略了它.至少在我的系统上,即使设置了 envvar,我仍然将 ascii 作为默认编码.

# 在 ~/.bashrc 和 ~/.profile 中尝试过这个(也来自它们)# 和在运行 python 之前的命令行导出 PYTHONIOENCODING=UTF-8

如果我在脚本开始时执行以下操作,它会起作用:

<预><代码>>>>导入系统>>>reload(sys) # 再次启用`setdefaultencoding`<模块'sys'(内置)>>>>sys.setdefaultencoding("UTF-8")>>>sys.getdefaultencoding()'UTF-8'

但这种方法似乎不干净.那么,实现这一目标的好方法是什么?

解决方法

而不是更改默认编码 - 这不是一个好主意(请参阅 mesilliac 的回答) - 我只是用 StreamWriter 包装 sys.stdout代码>像这样:

sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

请参阅此要点,了解处理它的小型实用函数.

解决方案

如何在 Python 中将 UTF-8 编码的文本打印到控制台

3?

print u"some unicode text N{EURO SIGN}"打印 b"一些 utf-8 编码的字节串 xe2x82xac".decode('utf-8')

即,如果您有一个 Unicode 字符串,则直接打印它.如果你有一个字节串,然后首先将其转换为 Unicode.

您的语言环境设置(LANGLC_CTYPE)指示 utf-8 语言环境和因此(理论上)你可以直接打印一个 utf-8 字节串,它应该在您的终端中正确显示(如果终端设置与区域设置一致,它们应该是)但您应该避免它:不要硬编码你的字符编码脚本中的环境直接打印 Unicode.

你的问题中有很多错误的假设.

您不需要使用区域设置来设置 PYTHONIOENCODING,将 Unicode 打印到终端.utf-8 语言环境支持所有 Unicode 字符,即按原样工作.

您不需要解决方法 sys.stdout =codecs.getwriter(locale.getpreferredencoding())(sys.stdout).它可能如果某些代码(您无法控制的)确实需要打印字节,则中断和/或它可能会中断将 Unicode 打印到 Windows 控制台(代码页错误,无法打印无法解码的字符).正确的区域设置和/或 PYTHONIOENCODING envvar 就足够了.另外,如果您需要替换 sys.stdout 然后 使用 io.TextIOWrapper() 而不是 codecs 模块win-unicode-console 可以.

sys.getdefaultencoding() 与您的区域设置无关,并且与Python 编码.您假设设置 PYTHONIOENCODING应该改变 sys.getdefaultencoding() 是不正确的.你应该改为检查 sys.stdout.encoding.

sys.getdefaultencoding() 在您打印到安慰.如果 stdout 是,它可以用作 Python 2 的后备除非设置了 PYTHOHIOENCODING,否则重定向到文件/管道:

$ python2 -c'import sys;打印(sys.stdout.encoding)'UTF-8$ python2 -c'import sys;打印(sys.stdout.encoding)' |猫没有任何$ PYTHONIOENCODING=utf8 python2 -c'import sys;打印(sys.stdout.encoding)' |猫utf8

不要调用sys.setdefaultencoding("UTF-8");它可能会损坏您的数据悄悄和/或破坏不期望的第 3 方模块它.记住 sys.getdefaultencoding() 用于转换字节串(str) to/from unicode in Python 2 implicitly 例如,"a" + u"b".也可以看看,@mesilliac 回答中的引用.

I'm running a recent Linux system where all my locales are UTF-8:

LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
...
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

Now I want to write UTF-8 encoded content to the console.

Right now Python uses UTF-8 for the FS encoding but sticks to ASCII for the default encoding :-(

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'

I thought the best (clean) way to do this was setting the PYTHONIOENCODING environment variable. But it seems that Python ignores it. At least on my system I keep getting ascii as default encoding, even after setting the envvar.

# tried this in ~/.bashrc and ~/.profile (also sourced them)
# and on the commandline before running python
export PYTHONIOENCODING=UTF-8

If I do the following at the start of a script, it works though:

>>> import sys
>>> reload(sys)  # to enable `setdefaultencoding` again
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("UTF-8")
>>> sys.getdefaultencoding()
'UTF-8'

But that approach seems unclean. So, what's a good way to accomplish this?

Workaround

Instead of changing the default encoding - which is not a good idea (see mesilliac's answer) - I just wrap sys.stdout with a StreamWriter like this:

sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

See this gist for a small utility function, that handles it.

解决方案

How to print UTF-8 encoded text to the console in Python < 3?

print u"some unicode text N{EURO SIGN}"
print b"some utf-8 encoded bytestring xe2x82xac".decode('utf-8')

i.e., if you have a Unicode string then print it directly. If you have a bytestring then convert it to Unicode first.

Your locale settings (LANG, LC_CTYPE) indicate a utf-8 locale and therefore (in theory) you could print a utf-8 bytestring directly and it should be displayed correctly in your terminal (if terminal settings are consistent with the locale settings and they should be) but you should avoid it: do not hardcode the character encoding of your environment inside your script; print Unicode directly instead.

There are many wrong assumptions in your question.

You do not need to set PYTHONIOENCODING with your locale settings, to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.

You do not need the workaround sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout). It may break if some code (that you do not control) does need to print bytes and/or it may break while printing Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/or PYTHONIOENCODING envvar are enough. Also, if you need to replace sys.stdout then use io.TextIOWrapper() instead of codecs module like win-unicode-console package does.

sys.getdefaultencoding() is unrelated to your locale settings and to PYTHONIOENCODING. Your assumption that setting PYTHONIOENCODING should change sys.getdefaultencoding() is incorrect. You should check sys.stdout.encoding instead.

sys.getdefaultencoding() is not used when you print to the console. It may be used as a fallback on Python 2 if stdout is redirected to a file/pipe unless PYTHOHIOENCODING is set:

$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)' | cat
None
$ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat
utf8

Do not call sys.setdefaultencoding("UTF-8"); it may corrupt your data silently and/or break 3rd-party modules that do not expect it. Remember sys.getdefaultencoding() is used to convert bytestrings (str) to/from unicode in Python 2 implicitly e.g., "a" + u"b". See also, the quote in @mesilliac's answer.

这篇关于如何在 Python 中将 UTF-8 编码的文本打印到控制台 &lt;3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆