如何使用 Python 3.4 (Windows 8) 将 utf-8 打印到控制台? [英] How to print utf-8 to console with Python 3.4 (Windows 8)?

查看:27
本文介绍了如何使用 Python 3.4 (Windows 8) 将 utf-8 打印到控制台?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 utf-8 卡片符号(♠、♥、♦︎︎、♣)从 Python 模块打印到 Windows 控制台.我使用的控制台是 git bash,我使用 console2 作为前端.我已经尝试/阅读了以下多种方法,但到目前为止没有任何效果.

  • 确保控制台可以处理 utf-8 字符.这两个测试让我相信控制台不是问题.

  • 尝试从 python 模块做同样的事情.
    当我执行 .py 时,这是结果.

     print(u'♠')UnicodeEncodeError: 'charmap' 编解码器无法对位置 0 的字符 'u2660' 进行编码:字符映射到 <undefined>

  • 尝试编码♠.这给了我以 utf-8 编码的 unicode 集,但仍然没有黑桃符号.

     text = '♠'打印(文本.编码('utf-8'))b'xe2x99xa0'

我觉得我错过了一个步骤或不了解整个编码/解码过程.我已经阅读了这个this这个.最后一页建议将 sys.stdout 包装到代码中,但 this 文章说使用 stdout 是不必要的,并使用 codecs 模块指向另一个页面.

解决方案

我想要做的是将 utf-8 卡片符号(♠、♥、♦、♣)从 python 模块打印到 Windows 控制台

UTF-8 是 Unicode 字符的字节编码.♠♥♦♣ 是 Unicode 字符,可以用多种编码进行复制,而 UTF-8 是其中一种编码——作为 UTF,UTF-8 可以复制任何 Unicode 字符.但是这些字符并没有专门的UTF-8".

其他可以重现字符 ♠♥♦♣ 的编码是 Windows 代码页 850437,您的控制台可能会在西欧安装的 Windows 下使用.您可以在这些编码中打印 ♠,但您没有使用 UTF-8 来执行此操作,并且您将无法使用在 UTF-8 中可用但在这些代码页范围之外的其他 Unicode 字符.

print(u'♠')UnicodeEncodeError: 'charmap' 编解码器无法编码字符 'u2660'

在 Python 3 中,这与您在上面所做的 print('♠') 测试相同,因此调用包含此 print,与您的 py -3.4 相比.sys.stdout.encoding 从脚本中给你什么?

要使 print 正常工作,您必须确保 Python 选择正确的编码.如果从终端设置中没有充分做到这一点,您确实必须将 PYTHONIOENCODING 设置为 cp437.

<预><代码>>>>文字 = '♠'>>>打印(文本.编码('utf-8'))b'xe2x99xa0'

print 只能打印 Unicode 字符串.对于其他类型,包括由 encode() 方法产生的 bytes 字符串,它获取对象的文字表示 (repr).b'xe2x99xa0' 是编写包含 UTF-8 编码 ♠ 的 Python 3 字节文字的方式.

如果您想要做的是绕过 print 的隐式编码到 PYTHONIOENCODING 并替换您自己的编码,您可以明确地这样做:

<预><代码>>>>导入系统>>>sys.stdout.buffer.write('♠'.encode('cp437'))

这当然会为任何不运行代码页 437 的控制台生成错误的输出(例如非西欧安装).通常,对于使用 C stdio 的应用程序,就像 Python 一样,将非 ASCII 字符输入到 Windows 控制台实在是太不可靠了.

I'm trying to print utf-8 card symbols (♠,♥,♦︎︎,♣) from a python module to a windows console. The console that I'm using is git bash and I'm using console2 as a front-end. I've tried/read a number of approaches below and nothing has worked so far.

  • Made sure the console can handle utf-8 characters. These two tests make me believe that the console isn't the problem.

  • Attempt the same thing from the python module.
    When I execute the .py, this is the result.

     print(u'♠')
     UnicodeEncodeError: 'charmap' codec can't encode character 'u2660' in position 0: character maps to <undefined>
    

  • Attempt to encode ♠. This gives me back the unicode set encoded in utf-8, but still no spade symbol.

     text = '♠'
     print(text.encode('utf-8'))
     b'xe2x99xa0'
    

I feel like I'm missing a step or not understanding the whole encode/decode process. I've read this, this, and this. The last of the pages suggests wrapping the sys.stdout into the code but this article says using stdout is unnecessary and points to another page using the codecs module.

解决方案

What I'm trying to do is print utf-8 card symbols (♠,♥,♦,♣) from a python module to a windows console

UTF-8 is a byte encoding of Unicode characters. ♠♥♦♣ are Unicode characters which can be reproduced in a variety of encodings and UTF-8 is one of those encodings—as a UTF, UTF-8 can reproduce any Unicode character. But there is nothing specifically "UTF-8" about those characters.

Other encodings that can reproduce the characters ♠♥♦♣ are Windows code page 850 and 437, which your console is likely to be using under a Western European install of Windows. You can print ♠ in these encodings but you are not using UTF-8 to do so, and you won't be able to use other Unicode characters that are available in UTF-8 but outside the scope of these code pages.

print(u'♠')
UnicodeEncodeError: 'charmap' codec can't encode character 'u2660'

In Python 3 this is the same as the print('♠') test you did above, so there is something different about how you are invoking the script containing this print, compared to your py -3.4. What does sys.stdout.encoding give you from the script?

To get print working correctly you would have to make sure Python picks up the right encoding. If it is not doing that adequately from the terminal settings you would indeed have to set PYTHONIOENCODING to cp437.

>>> text = '♠'
>>> print(text.encode('utf-8'))
b'xe2x99xa0'

print can only print Unicode strings. For other types including the bytes string that results from the encode() method, it gets the literal representation (repr) of the object. b'xe2x99xa0' is how you would write a Python 3 bytes literal containing a UTF-8 encoded ♠.

If what you want to do is bypass print's implicit encoding to PYTHONIOENCODING and substitute your own, you can do that explicitly:

>>> import sys
>>> sys.stdout.buffer.write('♠'.encode('cp437'))

This will of course generate wrong output for any consoles not running code page 437 (eg non-Western-European installs). Generally, for apps using the C stdio, like Python does, getting non-ASCII characters to the Windows console is just too unreliable to bother with.

这篇关于如何使用 Python 3.4 (Windows 8) 将 utf-8 打印到控制台?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆