使用ElementTree将utf-8数据写入xml utf-8文件 [英] Write xml utf-8 file with utf-8 data with ElementTree

查看:264
本文介绍了使用ElementTree将utf-8数据写入xml utf-8文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用ElementTree这样的utf-8编码数据编写xml文件:

I'm trying to write an xml file with utf-8 encoded data using ElementTree like this:

#!/usr/bin/python                                                                       
# -*- coding: utf-8 -*-                                                                   

import xml.etree.ElementTree as ET
import codecs

testtag = ET.Element('unicodetag')
testtag.text = u'Töreboda' #The o is really ö (o with two dots over). No idea why SO dont display this
expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
expfile.close()

这会因错误而炸毁

Traceback (most recent call last):
  File "unicodetest.py", line 10, in <module>
    ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 815, in write
    serialize(write, self._root, encoding, qnames, namespaces)    
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "/usr/lib/python2.7/codecs.py", line 691, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

使用 us-ascii编码可以正常工作,但不要在数据中保留unicode字符。

Using the "us-ascii" encoding instead works fine, but don't preserve the unicode characters in the data. What is happening?

推荐答案

codecs.open 期望Unicode字符串是什么?写入文件对象,它将处理UTF-8编码。 ElementTree的 write 将Unicode字符串编码为UTF-8字节字符串,然后再将它们发送到文件对象。由于文件对象需要Unicode字符串,因此它将使用默认的 ascii 编解码器将字节字符串强制转换回Unicode,并导致 UnicodeDecodeError

codecs.open expects Unicode strings to be written to the file object and it will handle encoding to UTF-8. ElementTree's write encodes the Unicode strings to UTF-8 byte strings before sending them to the file object. Since the file object wants Unicode strings, it is coercing the byte string back to Unicode using the default ascii codec and causing the UnicodeDecodeError.

只需执行以下操作:

#expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write('testunicode.xml',encoding="UTF-8",xml_declaration=True)
#expfile.close()

这篇关于使用ElementTree将utf-8数据写入xml utf-8文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆