Mac OS X终端机中的Python unicode [英] Python unicode in Mac os X terminal

查看:76
本文介绍了Mac OS X终端机中的Python unicode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我解释这个奇怪的事情吗?

Can someone explain to me this odd thing:

在python shell中,我键入以下西里尔字母字符串:

When in python shell I type the following Cyrillic string:

>>> print 'абвгд'
абвгд

但是当我键入:

>>> print u'абвгд'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)

由于第一个问题正确地出现了,所以我认为OS X终端可以表示unicode,但是事实证明在第二种情况下它不能.为什么?

Since the first tring came out correctly, I reckon my OS X terminal can represent unicode, but it turns out it can't in the second case. Why ?

推荐答案

>>> print 'абвгд'
абвгд

当您键入某些字符时,终端将决定如何将这些字符表示给应用程序.您的终端可能会将字符编码为utf-8,ISO-8859-5或什至只有您的终端才能理解的东西提供给应用程序. Python将这些字符作为字节序列来获取.然后python将这些字节原样打印出来,然后您的终端以某种方式解释它们以显示字符.由于您的终端通常以与以前编码相同的方式来解释字节,因此显示的内容与您键入时一样.

When you type in some characters, your terminal decides how these characters are represented to the application. Your terminal might give the characters to the application encoded as utf-8, ISO-8859-5 or even something that only your terminal understands. Python gets these characters as some sequence of bytes. Then python prints out these bytes as they are, and your terminal interprets them in some way to display characters. Since your terminal usually interprets the bytes the same way as it encoded them before, everything is displayed like you typed it in.

>>> u'абвгд'

在这里输入一些字符,这些字符以字节序列的形式到达python解释器,可能由终端以某种方式进行编码.使用u前缀,python会尝试将此数据转换为unicode.为了正确地做到这一点,python必须知道您的终端使用什么编码.在您的情况下,Python似乎猜测您的终端编码为ASCII,但是接收到的数据与此不匹配,因此您会遇到编码错误.

Here you type in some characters that arrive at the python interpreter as a sequence of bytes, maybe encoded in some way by the terminal. With the u prefix python tries to convert this data to unicode. To do this correctly python has to known what encoding your terminal uses. In your case it looks like Python guesses your terminals encoding would be ASCII, but the received data doesn't match that, so you get an encoding error.

因此,在交互式会话中创建unicode字符串的直接方法将是这样的:

The straight forward way to create unicode strings in an interactive session would therefore be something like this this:

>>> us = 'абвгд'.decode('my-terminal-encoding')

在文件中,您还可以使用特殊模式行指定文件的编码:

In files you can also specify the encoding of the file with a special mode line:

# -*- encoding: ISO-8859-5 -*-
us = u'абвгд'

有关设置默认输入编码的其他方法,请查看sys.setdefaultencoding(...)sys.stdin.encoding.

For other ways to set the default input encoding you can look at sys.setdefaultencoding(...) or sys.stdin.encoding.

这篇关于Mac OS X终端机中的Python unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆