UnicodeEncodeError: 'ascii' 编解码器无法编码字符 u'\u03c0' [英] UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0'
问题描述
我正在尝试从 pi.xml 中的 row
元素中提取属性值 Body
.
I am trying to extract the attribute value Body
from row
element in pi.xml.
cat pi.xml
<?xml version="1.0" encoding="utf-8"?>
<posts>
<row Id="19" Body=" The value of π, the value of pi." />
</posts>
python 文件 pi.py :
The python file, pi.py :
from lxml import etree
doc = etree.parse('pi.xml')
r = doc.findall('row')
for i in r:
print (i.get('Body'))
和语言环境:
$ locale:
LANG=en_IN
LANGUAGE=en_IN:en
LC_CTYPE="en_IN"
LC_NUMERIC="en_IN"
LC_TIME="en_IN"
LC_COLLATE="en_IN"
LC_ALL=
将 pi.py
和 python pi.py
一样运行后,一切正常.
但是,如果我尝试重定向输出并运行 pi.py 作为 python pi.py >>pi.txt
- 我收到一条错误消息 - UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0' in position 101: ordinal not in range(128)
Upon running pi.py
as as python pi.py
, everything is fine.
But, if I try to redirect the output and run pi.py as python pi.py >> pi.txt
- I get an error message - UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0' in position 101: ordinal not in range(128)
如果我将 print (i.get('Body'))
更改为 print (i.get('Body')).encode('utf-8')
code> ,然后 python pi.py >>pi.txt
工作正常.但是,这是正确的做法吗?
If I change print (i.get('Body'))
to print (i.get('Body')).encode('utf-8')
, then python pi.py >> pi.txt
works fine. But, is this the proper way to do it?
操作系统 - Ubuntu.
Operating System - Ubuntu.
推荐答案
使用:
PYTHONIOENCODING=utf8 python pi.py >> py.txt
但是如果你的脚本显式编码了它的输出,比如:
But if your script explicitly encodes its output, such as:
print u'somestring'.encode('utf8')
这个方法行不通.然而,脚本应该只打印 Unicode 并让终端决定编码,如:
this method won't work. However, scripts should just print Unicode and let the terminal decide the encoding, as in:
print u'somestring'
如果控制台配置为 UTF-8,Python 将自动编码为 UTF-8.
Python will automatically encode for UTF-8 if the console is configured for UTF-8.
对于您的重定向情况,Python 不知道在打印 Unicode 时使用什么编码,因此默认为 ascii
.由于重定向是一个 shell 函数,请使用以下命令指定 shell 的编码:
For your redirection case, Python doesn't know what encoding to use when printing Unicode, so defaults to ascii
. Since redirection is a shell function, leave specifying the encoding to the shell using:
PYTHONIOENCODING=utf8 python pi.py >> py.txt.
这使该选项处于打开状态,可以在不修改脚本的情况下使用其他编码.
This leaves the option open to use other encodings without modifying the script.
这篇关于UnicodeEncodeError: 'ascii' 编解码器无法编码字符 u'\u03c0'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!