使用python lxml解析部分XML [英] Parsing a partial XML with python lxml
问题描述
我正在尝试解析一个大型XML文件,该文件是使用Python从网络上接收的.
I'm trying to parse a large XML file which is being received from the network in Python.
为此,我获取了数据并将其传递给 lxml.etree.iterparse
In order to do that, I get the data and pass it to lxml.etree.iterparse
但是,如果XML尚未完全发送,就像这样:
However, if the XML has yet to fully be sent, like so:
<MyXML>
<MyNode foo="bar">
<MyNode foo="ba
如果我运行 etree.iterparse(f,tag ='MyNode').next()
,则无论它在什么地方被切断,我都会收到一个 XMLSyntaxError
.
If I run etree.iterparse(f, tag='MyNode').next()
I get an XMLSyntaxError
at whereever it was cut off.
有什么办法可以使我收到第一个标签(即第一个MyNode),并且只有在到达文档的那一部分时才获得异常?(要使lxml真正地流式传输"内容,而在一开始不读取全部内容).
Is there any way I can make it so I can receive the first tag (i.e. the first MyNode) and only get an exception when I reach that part of the document? (To make lxml really 'stream' the contents and not read the whole thing in the beginning).
推荐答案
XMLPullParser 和 HTMLPullParser 可能会更好地满足您的需求.他们通过重复调用 parser.feed(data)
来获取数据.在树可用之前,您仍然必须等待所有数据输入.
XMLPullParser and HTMLPullParser may better suite your needs. They get their data by repeated calls to parser.feed(data)
. You still have to wait until all of the data comes in before the tree is usable.
这篇关于使用python lxml解析部分XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!