UnicodeEncodeError: 'ascii' 编解码器无法编码字符 u'\u03c0' [英] UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0'

查看:64
本文介绍了UnicodeEncodeError: 'ascii' 编解码器无法编码字符 u'\u03c0'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 pi.xml 中的 row 元素中提取属性值 Body.

I am trying to extract the attribute value Body from row element in pi.xml.

    cat pi.xml
    <?xml version="1.0" encoding="utf-8"?>
    <posts>
         <row Id="19" Body=" The value of π, the value of pi." />
    </posts>

python 文件 pi.py :

The python file, pi.py :

    from lxml import etree
    doc = etree.parse('pi.xml')
    r = doc.findall('row')
    for i in r:
        print (i.get('Body'))

和语言环境:

    $ locale:
    LANG=en_IN
    LANGUAGE=en_IN:en
    LC_CTYPE="en_IN"
    LC_NUMERIC="en_IN"
    LC_TIME="en_IN"
    LC_COLLATE="en_IN"    
    LC_ALL=

pi.pypython pi.py 一样运行后,一切正常.
但是,如果我尝试重定向输出并运行 pi.py 作为 python pi.py >>pi.txt - 我收到一条错误消息 - UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0' in position 101: ordinal not in range(128)

Upon running pi.py as as python pi.py, everything is fine.
But, if I try to redirect the output and run pi.py as python pi.py >> pi.txt - I get an error message - UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0' in position 101: ordinal not in range(128)

如果我将 print (i.get('Body')) 更改为 print (i.get('Body')).encode('utf-8')code> ,然后 python pi.py >>pi.txt 工作正常.但是,这是正确的做法吗?

If I change print (i.get('Body')) to print (i.get('Body')).encode('utf-8') , then python pi.py >> pi.txt works fine. But, is this the proper way to do it?

操作系统 - Ubuntu.

Operating System - Ubuntu.

推荐答案

使用:

PYTHONIOENCODING=utf8 python pi.py >> py.txt

但是如果你的脚本显式编码了它的输出,比如:

But if your script explicitly encodes its output, such as:

print u'somestring'.encode('utf8')

这个方法行不通.然而,脚本应该只打印 Unicode 并让终端决定编码,如:

this method won't work. However, scripts should just print Unicode and let the terminal decide the encoding, as in:

print u'somestring'

如果控制台配置为 UTF-8,Python 将自动编码为 UTF-8.

Python will automatically encode for UTF-8 if the console is configured for UTF-8.

对于您的重定向情况,Python 不知道在打印 Unicode 时使用什么编码,因此默认为 ascii.由于重定向是一个 shell 函数,请使用以下命令指定 shell 的编码:

For your redirection case, Python doesn't know what encoding to use when printing Unicode, so defaults to ascii. Since redirection is a shell function, leave specifying the encoding to the shell using:

PYTHONIOENCODING=utf8 python pi.py >> py.txt.

这使该选项处于打开状态,可以在不修改脚本的情况下使用其他编码.

This leaves the option open to use other encodings without modifying the script.

这篇关于UnicodeEncodeError: 'ascii' 编解码器无法编码字符 u'\u03c0'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆