如何在 Python 中将 UTF-8 编码的文本打印到控制台 <3? [英] How to print UTF-8 encoded text to the console in Python < 3?
问题描述
我正在运行一个最新的 Linux 系统,其中我的所有语言环境都是 UTF-8:
LANG=de_DE.UTF-8语言=LC_CTYPE="de_DE.UTF-8"LC_NUMERIC="de_DE.UTF-8"LC_TIME="de_DE.UTF-8"...LC_IDENTIFICATION="de_DE.UTF-8"LC_ALL=
现在我想将 UTF-8 编码的内容写入控制台.
现在 Python 使用 UTF-8 作为 FS 编码,但坚持使用 ASCII 作为默认编码 :-(
<预><代码>>>>导入系统>>>sys.getdefaultencoding()'ascii'>>>sys.getfilesystemencoding()'UTF-8'我认为最好(干净)的方法是设置 PYTHONIOENCODING
环境变量.但似乎 Python 忽略了它.至少在我的系统上,即使设置了 envvar,我仍然将 ascii
作为默认编码.
# 在 ~/.bashrc 和 ~/.profile 中尝试过这个(也来自它们)# 和在运行 python 之前的命令行导出 PYTHONIOENCODING=UTF-8
如果我在脚本开始时执行以下操作,它会起作用:
<预><代码>>>>导入系统>>>reload(sys) # 再次启用`setdefaultencoding`<模块'sys'(内置)>>>>sys.setdefaultencoding("UTF-8")>>>sys.getdefaultencoding()'UTF-8'但这种方法似乎不干净.那么,实现这一目标的好方法是什么?
解决方法
而不是更改默认编码 - 这不是一个好主意(请参阅 mesilliac 的回答) - 我只是用 StreamWriter
包装 sys.stdout
代码>像这样:
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
请参阅此要点,了解处理它的小型实用函数.
如何在 Python 中将 UTF-8 编码的文本打印到控制台
3?
print u"some unicode text N{EURO SIGN}"打印 b"一些 utf-8 编码的字节串 xe2x82xac".decode('utf-8')
即,如果您有一个 Unicode 字符串,则直接打印它.如果你有一个字节串,然后首先将其转换为 Unicode.
您的语言环境设置(LANG
、LC_CTYPE
)指示 utf-8 语言环境和因此(理论上)你可以直接打印一个 utf-8 字节串,它应该在您的终端中正确显示(如果终端设置与区域设置一致,它们应该是)但您应该避免它:不要硬编码你的字符编码脚本中的环境;直接打印 Unicode.
你的问题中有很多错误的假设.
您不需要使用区域设置来设置 PYTHONIOENCODING
,将 Unicode 打印到终端.utf-8 语言环境支持所有 Unicode 字符,即按原样工作.
您不需要解决方法 sys.stdout =codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
.它可能如果某些代码(您无法控制的)确实需要打印字节,则中断和/或它可能会中断将 Unicode 打印到 Windows 控制台(代码页错误,无法打印无法解码的字符).正确的区域设置和/或 PYTHONIOENCODING
envvar 就足够了.另外,如果您需要替换 sys.stdout
然后 使用 io.TextIOWrapper()
而不是 codecs
模块 像 win-unicode-console
包 可以.
sys.getdefaultencoding()
与您的区域设置无关,并且与Python 编码
.您假设设置 PYTHONIOENCODING
应该改变 sys.getdefaultencoding()
是不正确的.你应该改为检查 sys.stdout.encoding
.
sys.getdefaultencoding()
在您打印到安慰.如果 stdout 是,它可以用作 Python 2 的后备除非设置了 PYTHOHIOENCODING
,否则重定向到文件/管道:
$ python2 -c'import sys;打印(sys.stdout.encoding)'UTF-8$ python2 -c'import sys;打印(sys.stdout.encoding)' |猫没有任何$ PYTHONIOENCODING=utf8 python2 -c'import sys;打印(sys.stdout.encoding)' |猫utf8
不要调用sys.setdefaultencoding("UTF-8")
;它可能会损坏您的数据悄悄和/或破坏不期望的第 3 方模块它.记住 sys.getdefaultencoding()
用于转换字节串(str
) to/from unicode
in Python 2 implicitly 例如,"a" + u"b"
.也可以看看,@mesilliac 回答中的引用.
I'm running a recent Linux system where all my locales are UTF-8:
LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
...
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
Now I want to write UTF-8 encoded content to the console.
Right now Python uses UTF-8 for the FS encoding but sticks to ASCII for the default encoding :-(
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'
I thought the best (clean) way to do this was setting the PYTHONIOENCODING
environment variable. But it seems that Python ignores it. At least on my system I keep getting ascii
as default encoding, even after setting the envvar.
# tried this in ~/.bashrc and ~/.profile (also sourced them)
# and on the commandline before running python
export PYTHONIOENCODING=UTF-8
If I do the following at the start of a script, it works though:
>>> import sys
>>> reload(sys) # to enable `setdefaultencoding` again
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("UTF-8")
>>> sys.getdefaultencoding()
'UTF-8'
But that approach seems unclean. So, what's a good way to accomplish this?
Workaround
Instead of changing the default encoding - which is not a good idea (see mesilliac's answer) - I just wrap sys.stdout
with a StreamWriter
like this:
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
See this gist for a small utility function, that handles it.
How to print UTF-8 encoded text to the console in Python < 3?
print u"some unicode text N{EURO SIGN}"
print b"some utf-8 encoded bytestring xe2x82xac".decode('utf-8')
i.e., if you have a Unicode string then print it directly. If you have a bytestring then convert it to Unicode first.
Your locale settings (LANG
, LC_CTYPE
) indicate a utf-8 locale and
therefore (in theory) you could print a utf-8 bytestring directly and it
should be displayed correctly in your terminal (if terminal settings
are consistent with the locale settings and they should be) but you
should avoid it: do not hardcode the character encoding of your
environment inside your script; print Unicode directly instead.
There are many wrong assumptions in your question.
You do not need to set PYTHONIOENCODING
with your locale settings,
to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.
You do not need the workaround sys.stdout =
codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
. It may
break if some code (that you do not control) does need to print bytes
and/or it may break while
printing Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/or PYTHONIOENCODING
envvar are enough. Also, if you need to replace sys.stdout
then use io.TextIOWrapper()
instead of codecs
module like win-unicode-console
package does.
sys.getdefaultencoding()
is unrelated to your locale settings and to
PYTHONIOENCODING
. Your assumption that setting PYTHONIOENCODING
should change sys.getdefaultencoding()
is incorrect. You should
check sys.stdout.encoding
instead.
sys.getdefaultencoding()
is not used when you print to the
console. It may be used as a fallback on Python 2 if stdout is
redirected to a file/pipe unless PYTHOHIOENCODING
is set:
$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)' | cat
None
$ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat
utf8
Do not call sys.setdefaultencoding("UTF-8")
; it may corrupt your
data silently and/or break 3rd-party modules that do not expect
it. Remember sys.getdefaultencoding()
is used to convert bytestrings
(str
) to/from unicode
in Python 2 implicitly e.g., "a" + u"b"
. See also,
the quote in @mesilliac's answer.
这篇关于如何在 Python 中将 UTF-8 编码的文本打印到控制台 <3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!