XML解析 - ElementTree vs SAX和DOM [英] XML parsing - ElementTree vs SAX and DOM

查看:181
本文介绍了XML解析 - ElementTree vs SAX和DOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python有几种方法来解析XML ...

Python has several ways to parse XML...

我了解使用 SAX解析的基础知识。它用作流解析器,使用事件驱动的API。

I understand the very basics of parsing with SAX. It functions as a stream parser, with an event-driven API.

我也了解了 DOM 解析器。它将XML读入内存并将其覆盖到可以使用Python访问的对象。

I understand the DOM parser also. It reads the XML into memory and coverts it to objects that can be accessed with Python.

一般来说,根据您需要的内容,可以轻松选择2做,记忆限制,表现等。

Generally speaking, it was easy to choose between the 2 depending on what you needed to do, memory constraints, performance, etc.

(希望到目前为止我是正确的)。

(hopefully I'm correct so far).

Python 2.5,我们也有 ElementTree 。这与DOM和SAX有何区别?哪个更相似?为什么它比以前的解析器更好?

Since Python 2.5, we also have ElementTree. How does this compare to DOM and SAX? Which is it more similar to? Why is it better than the previous parsers?

推荐答案

ElementTree更容易使用,因为它代表一个XML树(基本上)作为列表的结构,属性表示为字典。

ElementTree is much easier to use, because it represents an XML tree (basically) as a structure of lists, and attributes are represented as dictionaries.

ElementTree需要比DOM更少的内存(因为更快),而且解析开销 iterparse 与SAX相当。另外, iterparse 返回部分结构,并且您可以在解析时保持内存使用量不变,一旦处理结构就丢弃结构。

ElementTree needs much less memory for XML trees than DOM (and thus is faster), and the parsing overhead via iterparse is comparable to SAX. Additionally, iterparse returns partial structures, and you can keep memory usage constant during parsing by discarding the structures as soon as you process them.

如同Python 2.5中的ElementTree,与完整的XML库相比,只有一个小功能集,但它对于许多应用程序来说已经足够了。如果您需要一个验证解析器或完整的XPath支持,lxml就是要走的路。很长一段时间以来,它过去是非常不稳定的,但是我从2.1开始就没有任何问题。

ElementTree, as in Python 2.5, has only a small feature set compared to full-blown XML libraries, but it's enough for many applications. If you need a validating parser or complete XPath support, lxml is the way to go. For a long time, it used to be quite unstable, but I haven't had any problems with it since 2.1.

ElementTree偏离DOM,节点可以访问他们的父母和兄弟姐妹。处理实际文档而不是数据存储也有点麻烦,因为文本节点不被视为实际节点。在XML片段中

ElementTree deviates from DOM, where nodes have access to their parent and siblings. Handling actual documents rather than data stores is also a bit cumbersome, because text nodes aren't treated as actual nodes. In the XML snippet

<a>This is <b>a</b> test</a>

字符串 test 元素 b 中的尾部

我推荐使用ElementTree作为使用Python的所有XML处理的默认值,而将DOM或SAX作为具体问题的解决方案。

In general, I recommend ElementTree as the default for all XML processing with Python, and DOM or SAX as the solutions for specific problems.

这篇关于XML解析 - ElementTree vs SAX和DOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆