python中的lxml iterparse无法处理名称空间 [英] lxml iterparse in python can't handle namespaces

查看：98 发布时间：2020/5/4 8:33:50 python lxml iterparse

本文介绍了python中的lxml iterparse无法处理名称空间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

from lxml import etree
import StringIO

data= StringIO.StringIO('<root xmlns="http://some.random.schema"><a>One</a><a>Two</a><a>Three</a></root>')
docs = etree.iterparse(data,tag='a')
a,b = docs.next()


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "iterparse.pxi", line 478, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:95348)
  File "iterparse.pxi", line 534, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:95938)
StopIteration

工作正常，直到将名称空间添加到根节点.关于我可以做些什么的任何想法，或做这件事的正确方法? 由于文件很大，我需要受事件驱动.

Works fine untill I add the namespace to the root node. Any ideas as to what I can do as a work around, or the correct way of doing this? I need to be event driven due to very large files.

推荐答案

附加了名称空间后，标记不是a，而是{http://some.random.schema}a.试试这个(Python 3):

When there is a namespace attached, the tag isn't a, it's {http://some.random.schema}a. Try this (Python 3):

from lxml import etree
from io import BytesIO

xml = '''\
<root xmlns="http://some.random.schema">
  <a>One</a>
  <a>Two</a>
  <a>Three</a>
</root>'''
data = BytesIO(xml.encode())
docs = etree.iterparse(data, tag='{http://some.random.schema}a')
for event, elem in docs:
    print(f'{event}: {elem}')

或者，在Python 2中:

or, in Python 2:

from lxml import etree
from StringIO import StringIO

xml = '''\
<root xmlns="http://some.random.schema">
  <a>One</a>
  <a>Two</a>
  <a>Three</a>
</root>'''
data = StringIO(xml)
docs = etree.iterparse(data, tag='{http://some.random.schema}a')
for event, elem in docs:
    print event, elem

打印的内容如下:

end: <Element {http://some.random.schema}a at 0x10941e730>
end: <Element {http://some.random.schema}a at 0x10941e8c0>
end: <Element {http://some.random.schema}a at 0x10941e960>

正如@ mihail-shcheglov所指出的，也可以使用通配符*，该通配符适用于任何命名空间或不存在命名空间:

As @mihail-shcheglov pointed out, a wildcard * can also be used, which works for any or no namespace:

from lxml import etree
from io import BytesIO

xml = '''\
<root xmlns="http://some.random.schema">
  <a>One</a>
  <a>Two</a>
  <a>Three</a>
</root>'''
data = BytesIO(xml.encode())
docs = etree.iterparse(data, tag='{*}a')
for event, elem in docs:
    print(f'{event}: {elem}')

有关更多信息，请参见 lxml.etree文档.

See lxml.etree docs for more.

这篇关于python中的lxml iterparse无法处理名称空间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python中的lxml iterparse无法处理名称空间 [英] lxml iterparse in python can't handle namespaces

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python中的lxml iterparse无法处理名称空间 [英] lxml iterparse in python can&#39;t handle namespaces

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

python中的lxml iterparse无法处理名称空间 [英] lxml iterparse in python can't handle namespaces

登录关闭