如何在Python中打印UTF-8编码的文本到控制台3? [英] How to print UTF-8 encoded text to the console in Python < 3?

查看:954
本文介绍了如何在Python中打印UTF-8编码的文本到控制台3?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个最近的Linux系统,其中所有的地区都是UTF-8:

  LANG = de_DE.UTF- 8 
LANGUAGE =
LC_CTYPE =de_DE.UTF-8
LC_NUMERIC =de_DE.UTF-8
LC_TIME =de_DE.UTF-8
...
LC_IDENTIFICATION =de_DE.UTF-8
LC_ALL =

现在我想将UTF-8编码的内容写入控制台。



现在,Python使用UTF-8作为FS编码,但是粘贴到ASCII为默认编码: - (

 >>>导入sys 
>> ; sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'
pre>

我认为最好的(干净的)方法是设置 PYTHONIOENCODING 环境变量,但似乎至少在我的系统上,即使在设置了 envvar 之后,我仍然将 ascii 作为默认编码。

 #t在〜/ .bashrc和〜/ .profile(也源自它们)
#和命令行之前运行python
export PYTHONIOENCODING = UTF-8

如果我在脚本开始时执行以下操作:

 >>>导入sys 
>>>> reload(sys)#再次启用`setdefaultencoding'
< module'sys'(built-in)>
>>> sys.setdefaultencoding(UTF-8)
>>> sys.getdefaultencoding()
'UTF-8'

但是,不洁即可。那么,实现这一点的好方法是什么?



解决方法



而不是更改默认编码 - 这是不是一个好主意(请参阅mesilliac的答案) - 我只是用 StreamWriter sys.stdout c>像这样:

  sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

请参阅


如何在Python中打印UTF-8编码的文本到控制台3?




 打印u一些unicode文本\N {EURO SIGN}
print bsome utf-8 encoded bytestring \xe2\x82\xac.decode('utf-8')

ie,如果你有一个Unicode字符串,然后直接打印。如果你有
a bytestring,然后将其转换为Unicode。



您的区域设置( LANG LC_CTYPE )表示一个utf-8区域设置和
(理论上),您可以直接打印一个utf-8 bytestring,并且
应该正确显示您的终端(如果终端设置
与区域设置一致,应该是),但是
应该避免:不要硬编码脚本中
环境的字符编码
;



您的问题有很多错误的假设。



您不需要使用您的区域设置
PYTHONIOENCODING 设置为终端打印Unicode。 utf-8语言环境支持所有Unicode字符,即它的工作原理。



您不需要解决方法 sys.stdout =
codecs.getwriter(是locale.getpreferredencoding())(sys.stdout的)
。如果某些代码(您无法控制)确实需要打印
和/或可能会在
将Unicode打印到Windows控制台(错误的代码页,无法打印不可解码的字符)。 envvar正确的区域设置和/或 PYTHONIOENCODING envvar就足够了。另外,如果您需要替换 sys.stdout 然后使用 io.TextIOWrapper()而不是编解码器模块喜欢 win-unicode-console



sys.getdefaultencoding()与您的区域设置无关,而
PYTHONIOENCODING 。您假设设置 PYTHONIOENCODING
应该更改 sys.getdefaultencoding()不正确。您应该
检查 sys.stdout.encoding



sys。当您打印到
控制台时,不会使用getdefaultencoding()
。如果stdout是
重定向到文件/管道,则可以将其用作Python 2的后备程序,除非 PYTHOHIOENCODING 设置为:

  $ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)'| cat

$ PYTHONIOENCODING = utf8 python2 -c'import sys; print(sys.stdout.encoding)'| cat
utf8

不要调用 sys.setdefaultencoding(UTF -8\" );它可能会无声地损坏您的
数据和/或破坏不期望
的第三方模块。记住 sys.getdefaultencoding()用于将bytestrings
str )转换为/ code> unicode in Python 2 隐式例如,a+ ub。另请参阅
@ mesilliac的回答中的引用


I'm running a recent Linux system where all my locales are UTF-8:

LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
...
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

Now I want to write UTF-8 encoded content to the console.

Right now Python uses UTF-8 for the FS encoding but sticks to ASCII for the default encoding :-(

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'UTF-8'

I thought the best (clean) way to do this was setting the PYTHONIOENCODING environment variable. But it seems that Python ignores it. At least on my system I keep getting ascii as default encoding, even after setting the envvar.

# tried this in ~/.bashrc and ~/.profile (also sourced them)
# and on the commandline before running python
export PYTHONIOENCODING=UTF-8

If I do the following at the start of a script, it works though:

>>> import sys
>>> reload(sys)  # to enable `setdefaultencoding` again
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("UTF-8")
>>> sys.getdefaultencoding()
'UTF-8'

But that approach seems unclean. So, what's a good way to accomplish this?

Workaround

Instead of changing the default encoding - which is not a good idea (see mesilliac's answer) - I just wrap sys.stdout with a StreamWriter like this:

sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)

See this gist for a small utility function, that handles it.

解决方案

How to print UTF-8 encoded text to the console in Python < 3?

print u"some unicode text \N{EURO SIGN}"
print b"some utf-8 encoded bytestring \xe2\x82\xac".decode('utf-8')

i.e., if you have a Unicode string then print it directly. If you have a bytestring then convert it to Unicode first.

Your locale settings (LANG, LC_CTYPE) indicate a utf-8 locale and therefore (in theory) you could print a utf-8 bytestring directly and it should be displayed correctly in your terminal (if terminal settings are consistent with the locale settings and they should be) but you should avoid it: do not hardcode the character encoding of your environment inside your script; print Unicode directly instead.

There are many wrong assumptions in your question.

You do not need to set PYTHONIOENCODING with your locale settings, to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.

You do not need the workaround sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout). It may break if some code (that you do not control) does need to print bytes and/or it may break while printing Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/or PYTHONIOENCODING envvar are enough. Also, if you need to replace sys.stdout then use io.TextIOWrapper() instead of codecs module like win-unicode-console package does.

sys.getdefaultencoding() is unrelated to your locale settings and to PYTHONIOENCODING. Your assumption that setting PYTHONIOENCODING should change sys.getdefaultencoding() is incorrect. You should check sys.stdout.encoding instead.

sys.getdefaultencoding() is not used when you print to the console. It may be used as a fallback on Python 2 if stdout is redirected to a file/pipe unless PYTHOHIOENCODING is set:

$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)' | cat
None
$ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat
utf8

Do not call sys.setdefaultencoding("UTF-8"); it may corrupt your data silently and/or break 3rd-party modules that do not expect it. Remember sys.getdefaultencoding() is used to convert bytestrings (str) to/from unicode in Python 2 implicitly e.g., "a" + u"b". See also, the quote in @mesilliac's answer.

这篇关于如何在Python中打印UTF-8编码的文本到控制台3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆