Python:Unicode 和 ElementTree.parse [英] Python: Unicode and ElementTree.parse
问题描述
我正在尝试转向 Python 2.7,因为 Unicode 在那里很重要,我会尝试使用 XML 文件和文本处理它们并使用 xml.etree.cElementTree
解析它们图书馆.但是我遇到了这个错误:
使用 io.open('filename.xml', encoding='utf-8')
传递给 ET.parse
会发生同样的事情:
我在这里遗漏了有关 unicode 和 ET 解析的内容吗?
edit:显然,ET 解析器不能很好地处理 unicode 输入流?以下工作:
<预><代码>>>>使用 io.open('test.xml', mode='rb') 作为 fp:... ET.parse(fp)...<0x0180BC10处的ElementTree对象>但这也意味着如果我想从内存中解析文本,我不能使用 io.StringIO
,除非我先将它编码到内存缓冲区中?
你不能用
doc = ET.fromstring(source)
在你的第一个例子中?
I'm trying to move to Python 2.7 and since Unicode is a Big Deal there, I'd try dealing with them with XML files and texts and parse them using the xml.etree.cElementTree
library. But I ran across this error:
>>> import xml.etree.cElementTree as ET
>>> from io import StringIO
>>> source = """\
... <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
... <root>
... <Parent>
... <Child>
... <Element>Text</Element>
... </Child>
... </Parent>
... </root>
... """
>>> srcbuf = StringIO(source.decode('utf-8'))
>>> doc = ET.parse(srcbuf)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 56, in parse
File "<string>", line 35, in parse
cElementTree.ParseError: no element found: line 1, column 0
The same thing happens using io.open('filename.xml', encoding='utf-8')
to pass to ET.parse
:
>>> with io.open('test.xml', mode='w', encoding='utf-8') as fp:
... fp.write(source.decode('utf-8'))
...
150L
>>> with io.open('test.xml', mode='r', encoding='utf-8') as fp:
... fp.read()
...
u'<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>\n<root>\n <Parent>\n
<Child>\n <Element>Text</Element>\n </Child>\n </Parent>\n</root>\n
'
>>> with io.open('test.xml', mode='r', encoding='utf-8') as fp:
... ET.parse(fp)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "<string>", line 56, in parse
File "<string>", line 35, in parse
cElementTree.ParseError: no element found: line 1, column 0
Is there something about unicode and ET parsing that I am missing here?
edit: Apparently, the ET parser does not play well with unicode input stream? The following works:
>>> with io.open('test.xml', mode='rb') as fp:
... ET.parse(fp)
...
<ElementTree object at 0x0180BC10>
But this also means I cannot use io.StringIO
if I want to parse from an in-memory text, unless I encode it first into an in-memory buffer?
Can't you use
doc = ET.fromstring(source)
in your first example ?
这篇关于Python:Unicode 和 ElementTree.parse的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!