了解Python Unicode和Linux终端 [英] Understanding Python Unicode and Linux terminal

查看:87
本文介绍了了解Python Unicode和Linux终端的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Python脚本,该脚本以UTF-8编码写一些字符串.在我的脚本中,我主要使用str()函数强制转换为字符串.看起来像这样:

I have a Python script that writes some strings with UTF-8 encoding. In my script I am using mainly the str() function to cast to string. It looks like that:

mystring="this is unicode string:"+japanesevalues[1] 
#japanesevalues is a list of unicode values, I am sure it is unicode
print mystring

我不使用Python终端,仅使用标准的Linux Red Hat x86_64终端.我将终端设置为输出utf8字符.

I don't use the Python terminal, just the standard Linux Red Hat x86_64 terminal. I set the terminal to output utf8 chars.

如果我执行此操作:

#python myscript.py
this is unicode string: カラダーズ ソフィー

但是,如果我这样做:

#python myscript.py > output

我遇到了典型的错误:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 253-254: ordinal not in range(128)

那是为什么?

推荐答案

终端有一个字符集,Python知道该字符集是什么,因此它将自动将Unicode字符串解码为终端使用的字节编码,以您的情况为UTF-8.

The terminal has a character set, and Python knows what that character set is, so it will automatically decode your Unicode strings to the byte-encoding that the terminal uses, in your case UTF-8.

但是,当您重定向时,您将不再使用终端.您现在仅使用Unix管道.该Unix管道没有字符集,Python无法知道您现在想要哪种编码,因此它将退回到默认字符集. 您已经用"Python-3.x"标记了问题,但是您的print语法是Python 2,所以我怀疑您实际上是在使用Python2.然后,您的sys.getdefaultencoding()通常是'ascii',在您的情况下是肯定是这样.当然,您不能将日语字符编码为ASCII,因此会出现错误.

But when you redirect, you are no longer using the terminal. You are now just using a Unix pipe. That Unix pipe doesn't have a charset, and Python has no way of knowing which encoding you now want, so it will fall back to a default character set. You have marked your question with "Python-3.x" but your print syntax is Python 2, so I suspect you are actually using Python 2. And then your sys.getdefaultencoding() is generally 'ascii', and in your case it's definitely so. And of course, you can not encode Japanese characters as ASCII, so you get an error.

使用Python 2时,最好的选择是在打印字符串之前先用UTF-8对其进行编码.然后重定向将起作用,并且生成的文件为UTF-8.但这意味着如果您的终端不是其他终端,它将无法正常工作,但是您可以从sys.stdout.encoding获取终端编码并使用该编码(在Python 2下重定向时为None).

Your best bet when using Python 2 is to encode the string with UTF-8 before printing it. Then redirection will work, and the resulting file with be UTF-8. That means it will not work if your terminal is something else, though, but you can get the terminal encoding from sys.stdout.encoding and use that (it will be None when redirecting under Python 2).

在Python 3中,您的代码应按原样工作,除了需要将print mystring更改为print(mystring).

In Python 3, your code should work as is, except that you need to change print mystring to print(mystring).

这篇关于了解Python Unicode和Linux终端的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆