ElementTree和unicode [英] ElementTree and unicode

查看:128
本文介绍了ElementTree和unicode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 < data> 
< products>
< color>fumè< / color>
< / product>
< / data>

我尝试使用以下代码生成ElementTree的一个实例:

  string_data = open('file.xml')
x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))

我收到以下错误:

  UnicodeEncodeError:'ascii'编解码器无法对位置185中的字符u'\xe8进行编码:序号不在范围(128)
pre>

(注意:位置不准确,我从较大的样本中抽取xml)。



如何解决?谢谢

解决方案

您不需要解码XML,以供ElementTree工作。 XML携带自己的编码信息(默认为UTF-8),ElementTree为您提供工作,输出unicode:

 > >> data ='''\ 
...< data>
...< products>
...< color>fumè< / color>
...< / products>
...< / data>
...'''
>>> x = ElementTree.fromstring(data)
>>> x [0] [0] .text
u'fum\xe8'

如果您的数据包含在一个文件(如)对象中,只需将文件或文件对象直接传递到 ElementTree.parse()函数:

  x = ElementTree.parse('file.xml')


I have this char in an xml file:

<data>
  <products>
      <color>fumè</color>
  </product>
</data>

I try to generate an instance of ElementTree with the following code:

string_data = open('file.xml')
x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))

and I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 185: ordinal not in range(128)

(NOTE: The position is not exact, I sampled the xml from a larger one).

How to solve it? Thanks

解决方案

You do not need to decode XML for ElementTree to work. XML carries it's own encoding information (defaulting to UTF-8) and ElementTree does the work for you, outputting unicode:

>>> data = '''\
... <data>
...   <products>
...       <color>fumè</color>
...   </products>
... </data>
... '''
>>> x = ElementTree.fromstring(data)
>>> x[0][0].text
u'fum\xe8'

If your data is contained in a file(like) object, just pass the filename or file object directly to the ElementTree.parse() function:

x = ElementTree.parse('file.xml')

这篇关于ElementTree和unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆