使用python lxml解析部分XML [英] Parsing a partial XML with python lxml

查看:64
本文介绍了使用python lxml解析部分XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析一个大型XML文件,该文件是使用Python从网络上接收的.

I'm trying to parse a large XML file which is being received from the network in Python.

为此,我获取了数据并将其传递给 lxml.etree.iterparse

In order to do that, I get the data and pass it to lxml.etree.iterparse

但是,如果XML尚未完全发送,就像这样:

However, if the XML has yet to fully be sent, like so:

<MyXML>
    <MyNode foo="bar">
    <MyNode foo="ba

如果我运行 etree.iterparse(f,tag ='MyNode').next(),则无论它在什么地方被切断,我都会收到一个 XMLSyntaxError .

If I run etree.iterparse(f, tag='MyNode').next() I get an XMLSyntaxError at whereever it was cut off.

有什么办法可以使我收到第一个标签(即第一个MyNode),并且只有在到达文档的那一部分时才获得异常?(要使lxml真正地流式传输"内容,而在一开始不读取全部内容).

Is there any way I can make it so I can receive the first tag (i.e. the first MyNode) and only get an exception when I reach that part of the document? (To make lxml really 'stream' the contents and not read the whole thing in the beginning).

推荐答案

XMLPullParser HTMLPullParser 可能会更好地满足您的需求.他们通过重复调用 parser.feed(data)来获取数据.在树可用之前,您仍然必须等待所有数据输入.

XMLPullParser and HTMLPullParser may better suite your needs. They get their data by repeated calls to parser.feed(data). You still have to wait until all of the data comes in before the tree is usable.

这篇关于使用python lxml解析部分XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆