Python 3.4、Unicode、不同的语言和 Windows 有什么关系? [英] What's the deal with Python 3.4, Unicode, different languages and Windows?

查看:50
本文介绍了Python 3.4、Unicode、不同的语言和 Windows 有什么关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

快乐的例子:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

czech = u'Leoš Janáček'.encode("utf-8")
print(czech)

pl = u'Zdzisław Beksiński'.encode("utf-8")
print(pl)

jp = u'リング 山村 貞子'.encode("utf-8")
print(jp)

chinese = u'五行'.encode("utf-8")
print(chinese)

MIR = u'Машина для Инженерных Расчётов'.encode("utf-8")
print(MIR)

pt = u'Minha Língua Portuguesa: çáà'.encode("utf-8")
print(pt)

不愉快的输出:

b'Leo\xc5\xa1 Jan\xc3\xa1\xc4\x8dek'
b'Zdzis\xc5\x82aw Beksi\xc5\x84ski'
b'\xe3\x83\xaa\xe3\x83\xb3\xe3\x82\xb0 \xe5\xb1\xb1\xe6\x9d\x91 \xe8\xb2\x9e\xe5\xad\x90'
b'\xe4\xba\x94\xe8\xa1\x8c'
b'\xd0\x9c\xd0\xb0\xd1\x88\xd0\xb8\xd0\xbd\xd0\xb0 \xd0\xb4\xd0\xbb\xd1\x8f \xd0\x98\xd0\xbd\xd0\xb6\xd0\xb5\xd0\xbd\xd0\xb5\xd1\x80\xd0\xbd\xd1\x8b\xd1\x85 \xd0\xa0\xd0\xb0\xd1\x81\xd1\x87\xd1\x91\xd1\x82\xd0\xbe\xd0\xb2'
b'Minha L\xc3\xadngua Portuguesa: \xc3\xa7\xc3\xa1\xc3\xa0'

如果我像这样打印它们:

And if I print them like this:

jp = u'リング 山村 貞子'
print(jp)

我明白了:

Traceback (most recent call last):
  File "x.py", line 5, in <module>
    print(jp)
  File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position
0-2: character maps to <undefined>

我还尝试了以下来自 这个问题(以及其他涉及 sys.stdout.encoding) 的替代方案:

I've also tried the following from this question (And other alternatives that involve sys.stdout.encoding):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function
import sys

def safeprint(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

jp = u'リング 山村 貞子'
safeprint(jp)

事情变得更加神秘:

リング 山村 貞子

而且文档不是很有帮助.

那么,Python 3.4、Unicode、不同的语言和 Windows 有什么关系?我能找到的几乎所有可能的例子都涉及 Python 2.x.

So, what's the deal with Python 3.4, Unicode, different languages and Windows? Almost all possible examples I could find, deal with Python 2.x.

是否有一种通用的跨平台方式可以在 Python 3.4 中以一种体面且不讨厌的方式打印来自任何语言的任何 Unicode 字符?

Is there a general and cross-platform way of printing ANY Unicode character from any language in a decent and non-nasty way in Python 3.4?

我试过在终端打字:

chcp 65001

要更改代码页,按照此处的建议和在评论,但它不起作用(包括尝试使用 sys.stdout.encoding)

To change the code page, as proposed here and in the comments, and it did not work (Including the attempt with sys.stdout.encoding)

推荐答案

Windows 控制台的问题iswas(请参阅下面的 Python 3.6 更新),该控制台支持适用于您的 Windows 版本所针对的区域的 ANSI 字符集.Python在输出不支持的字符时默认抛出异常.

The problem iswas (see Python 3.6 update below) with the Windows console, which supports an ANSI character set appropriate for the region targeted by your version of Windows. Python throws an exception by default when unsupported characters are output.

Python 可以读取环境变量以输出其他编码,或更改错误处理默认值.下面,我已经阅读了控制台默认值并更改了默认错误处理以打印 ? 而不是为控制台当前代码页中不支持的字符抛出错误.

Python can read an environment variable to output in other encodings, or to change the error handling default. Below, I've read the console default and change the default error handling to print a ? instead of throwing an error for characters that are unsupported in the console's current code page.

C:\>chcp
Active code page: 437   # Note, US Windows OEM code page.

C:\>set PYTHONIOENCODING=437:replace

C:\>example.py
Leo? Janá?ek
Zdzis?aw Beksi?ski
??? ?? ??
??
?????? ??? ?????????? ????????
Minha Língua Portuguesa: çáà

请注意,美国 OEM 代码页仅限于 ASCII 和一些西欧字符.

Note the US OEM code page is limited to ASCII and some Western European characters.

下面我已经指示 Python 使用 UTF8,但由于 Windows 控制台不支持它,我将输出重定向到一个文件并在记事本中显示:

Below I've instructed Python to use UTF8, but since the Windows console doesn't support it, I redirect the output to a file and display it in Notepad:

C:\>set PYTHONIOENCODING=utf8
C:\>example >out.txt
C:\>notepad out.txt

在 Windows 上,当处理多种语言时,最好使用支持 UTF-8 的 Python IDE 而不是控制台.如果只使用一种语言,请在Region and Language控制面板中选择它作为系统区域设置,控制台将支持该语言的字符.

On Windows, its best to use a Python IDE that supports UTF-8 instead of the console when working with multiple languages. If only using one language, select it as the system locale in the Region and Language control panel and the console will support the characters of that language.

Python 3.6 现在使用 Windows Unicode API 直接写入控制台,因此唯一的限制是控制台字体对字符的支持.以下代码适用于美国 Windows 控制台.我安装了中文语言包,如果控制台字体更改,它甚至会显示中文和日文.即使没有正确的字体,替换字符也会显示在控制台中.剪切-粘贴到此网页等环境将正确显示字符.

Python 3.6 now uses Windows Unicode APIs to write directly to the console, so the only limit is the console font's support of the characters. The following code works in a US Windows console. I have a Chinese language pack installed, it even displays the Chinese and Japanese if the console font is changed. Even without the correct font, replacement characters are shown in the console. Cut-n-paste to an environment such as this web page will display the characters correctly.

#!python3.6
#coding: utf8
czech = 'Leoš Janáček'
print(czech)

pl = 'Zdzisław Beksiński'
print(pl)

jp = 'リング 山村 貞子'
print(jp)

chinese = '五行'
print(chinese)

MIR = 'Машина для Инженерных Расчётов'
print(MIR)

pt = 'Minha Língua Portuguesa: çáà'
print(pt)

输出:

Leoš Janáček
Zdzisław Beksiński
リング 山村 貞子
五行
Машина для Инженерных Расчётов
Minha Língua Portuguesa: çáà

这篇关于Python 3.4、Unicode、不同的语言和 Windows 有什么关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆