如何在Python中打印UTF-8编码的文本到控制台3? [英] How to print UTF-8 encoded text to the console in Python < 3?
问题描述
LANG = de_DE.UTF- 8
LANGUAGE =
LC_CTYPE =de_DE.UTF-8
LC_NUMERIC =de_DE.UTF-8
LC_TIME =de_DE.UTF-8
...
LC_IDENTIFICATION =de_DE.UTF-8
LC_ALL =
现在我想将UTF-8编码的内容写入控制台。
现在,Python使用UTF-8作为FS编码,但是粘贴到ASCII为默认编码: - (
>>>导入sys
pre>
>> ; sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'
我认为最好的(干净的)方法是设置
PYTHONIOENCODING
环境变量,但似乎至少在我的系统上,即使在设置了 envvar 之后,我仍然将ascii
作为默认编码。#t在〜/ .bashrc和〜/ .profile(也源自它们)
#和命令行之前运行python
export PYTHONIOENCODING = UTF-8
如果我在脚本开始时执行以下操作:
>>>导入sys
>>>> reload(sys)#再次启用`setdefaultencoding'
< module'sys'(built-in)>
>>> sys.setdefaultencoding(UTF-8)
>>> sys.getdefaultencoding()
'UTF-8'
但是,不洁即可。那么,实现这一点的好方法是什么?
解决方法
而不是更改默认编码 - 这是不是一个好主意(请参阅mesilliac的答案) - 我只是用
StreamWriter $ c $包装
sys.stdout
c>像这样:sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
请参阅
解决方案
如何在Python中打印UTF-8编码的文本到控制台3?
打印u一些unicode文本\N {EURO SIGN}
print bsome utf-8 encoded bytestring \xe2\x82\xac.decode('utf-8')
ie,如果你有一个Unicode字符串,然后直接打印。如果你有
a bytestring,然后将其转换为Unicode。
您的区域设置(
LANG
,LC_CTYPE
)表示一个utf-8区域设置和
(理论上),您可以直接打印一个utf-8 bytestring,并且
应该正确显示您的终端(如果终端设置
与区域设置一致,应该是),但是
应该避免:不要硬编码脚本中
环境的字符编码;
您的问题有很多错误的假设。
您不需要使用您的区域设置
将PYTHONIOENCODING
设置为终端打印Unicode。 utf-8语言环境支持所有Unicode字符,即它的工作原理。
您不需要解决方法
sys.stdout =
。如果某些代码(您无法控制)确实需要打印
codecs.getwriter(是locale.getpreferredencoding())(sys.stdout的)
和/或可能会在
将Unicode打印到Windows控制台(错误的代码页,无法打印不可解码的字符)。 envvar正确的区域设置和/或PYTHONIOENCODING
envvar就足够了。另外,如果您需要替换sys.stdout
然后使用io.TextIOWrapper()
而不是编解码器
模块喜欢win-unicode-console
包 。
sys.getdefaultencoding()
与您的区域设置无关,而
PYTHONIOENCODING
。您假设设置PYTHONIOENCODING
应该更改sys.getdefaultencoding()
不正确。您应该
检查sys.stdout.encoding
。
sys。当您打印到
。如果stdout是
控制台时,不会使用getdefaultencoding()
重定向到文件/管道,则可以将其用作Python 2的后备程序,除非PYTHOHIOENCODING
设置为:$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)'| cat
无
$ PYTHONIOENCODING = utf8 python2 -c'import sys; print(sys.stdout.encoding)'| cat
utf8
不要调用
sys.setdefaultencoding(UTF -8\" )
;它可能会无声地损坏您的
数据和/或破坏不期望
的第三方模块。记住sys.getdefaultencoding()
用于将bytestrings
(str
)转换为/ code> unicode in Python 2 隐式例如,a+ ub
。另请参阅
@ mesilliac的回答中的引用。I'm running a recent Linux system where all my locales are UTF-8:
LANG=de_DE.UTF-8 LANGUAGE= LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" ... LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL=
Now I want to write UTF-8 encoded content to the console.
Right now Python uses UTF-8 for the FS encoding but sticks to ASCII for the default encoding :-(
>>> import sys >>> sys.getdefaultencoding() 'ascii' >>> sys.getfilesystemencoding() 'UTF-8'
I thought the best (clean) way to do this was setting the
PYTHONIOENCODING
environment variable. But it seems that Python ignores it. At least on my system I keep gettingascii
as default encoding, even after setting the envvar.# tried this in ~/.bashrc and ~/.profile (also sourced them) # and on the commandline before running python export PYTHONIOENCODING=UTF-8
If I do the following at the start of a script, it works though:
>>> import sys >>> reload(sys) # to enable `setdefaultencoding` again <module 'sys' (built-in)> >>> sys.setdefaultencoding("UTF-8") >>> sys.getdefaultencoding() 'UTF-8'
But that approach seems unclean. So, what's a good way to accomplish this?
Workaround
Instead of changing the default encoding - which is not a good idea (see mesilliac's answer) - I just wrap
sys.stdout
with aStreamWriter
like this:sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
See this gist for a small utility function, that handles it.
解决方案How to print UTF-8 encoded text to the console in Python < 3?
print u"some unicode text \N{EURO SIGN}" print b"some utf-8 encoded bytestring \xe2\x82\xac".decode('utf-8')
i.e., if you have a Unicode string then print it directly. If you have a bytestring then convert it to Unicode first.
Your locale settings (
LANG
,LC_CTYPE
) indicate a utf-8 locale and therefore (in theory) you could print a utf-8 bytestring directly and it should be displayed correctly in your terminal (if terminal settings are consistent with the locale settings and they should be) but you should avoid it: do not hardcode the character encoding of your environment inside your script; print Unicode directly instead.There are many wrong assumptions in your question.
You do not need to set
PYTHONIOENCODING
with your locale settings, to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.You do not need the workaround
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
. It may break if some code (that you do not control) does need to print bytes and/or it may break while printing Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/orPYTHONIOENCODING
envvar are enough. Also, if you need to replacesys.stdout
then useio.TextIOWrapper()
instead ofcodecs
module likewin-unicode-console
package does.
sys.getdefaultencoding()
is unrelated to your locale settings and toPYTHONIOENCODING
. Your assumption that settingPYTHONIOENCODING
should changesys.getdefaultencoding()
is incorrect. You should checksys.stdout.encoding
instead.
sys.getdefaultencoding()
is not used when you print to the console. It may be used as a fallback on Python 2 if stdout is redirected to a file/pipe unlessPYTHOHIOENCODING
is set:$ python2 -c'import sys; print(sys.stdout.encoding)' UTF-8 $ python2 -c'import sys; print(sys.stdout.encoding)' | cat None $ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat utf8
Do not call
sys.setdefaultencoding("UTF-8")
; it may corrupt your data silently and/or break 3rd-party modules that do not expect it. Remembersys.getdefaultencoding()
is used to convert bytestrings (str
) to/fromunicode
in Python 2 implicitly e.g.,"a" + u"b"
. See also, the quote in @mesilliac's answer.这篇关于如何在Python中打印UTF-8编码的文本到控制台3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!