ElementTree 和 unicode [英] ElementTree and unicode
问题描述
我在一个 xml 文件中有这个字符:
<产品><color>fumè</color></产品></数据>
我尝试使用以下代码生成 ElementTree 的实例:
string_data = open('file.xml')x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))
我收到以下错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'xe8' in position 185: ordinal not in range(128)
(注意:位置不准确,我从一个较大的 xml 中采样).
如何解决?谢谢
您不需要解码 XML 即可让 ElementTree 工作.XML 携带它自己的编码信息(默认为 UTF-8),ElementTree 为您完成工作,输出 unicode:
<预><代码>>>>数据 = '''... <数据>... <产品>... <color>fumè</color>... </产品>... </数据>...'''>>>x = ElementTree.fromstring(data)>>>x[0][0].text你'fumxe8'如果您的数据包含在文件(类似)对象中,只需将文件名或文件对象直接传递给 ElementTree.parse()
函数:
x = ElementTree.parse('file.xml')
I have this char in an xml file:
<data>
<products>
<color>fumè</color>
</product>
</data>
I try to generate an instance of ElementTree with the following code:
string_data = open('file.xml')
x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))
and I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'xe8' in position 185: ordinal not in range(128)
(NOTE: The position is not exact, I sampled the xml from a larger one).
How to solve it? Thanks
You do not need to decode XML for ElementTree to work. XML carries it's own encoding information (defaulting to UTF-8) and ElementTree does the work for you, outputting unicode:
>>> data = '''
... <data>
... <products>
... <color>fumè</color>
... </products>
... </data>
... '''
>>> x = ElementTree.fromstring(data)
>>> x[0][0].text
u'fumxe8'
If your data is contained in a file(like) object, just pass the filename or file object directly to the ElementTree.parse()
function:
x = ElementTree.parse('file.xml')
这篇关于ElementTree 和 unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!