为什么sax解析比dom解析更快?以及stax如何工作? [英] why is sax parsing faster than dom parsing ? and how does stax work?

查看:190
本文介绍了为什么sax解析比dom解析更快?以及stax如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有些相关:来自java的libxml2



是的,这个问题是相当长的 - 抱歉。我保持的是密集的,我觉得有可能。



为什么sax解析比dom解析更快?唯一的我可以想出的是,w / sax你可能忽略了大部分传入的数据,因此不会浪费你不关心的xml的时间处理部分。 IOW - 解析w / SAX后,无法重新创建原始输入。 如果您编写了SAX解析器,以便它占用每个xml节点(因此可以重新创建原始的),那么它不会比DOM快吗?



我问的原因是我试图更快地解析xml文档。解析后,我需要访问整个xml树。我正在为第三方服务编写一个平台来插入,所以我无法预料到需要什么部分的xml文档,哪些部分不会。我甚至不知道传入文档的结构。这就是为什么我不能使用jaxb或sax。内存占用不是我的一个问题,因为xml文档很小,我一次只需要1个内存。这是解决这个相对较小的xml文件杀死我的时候了。我以前没有使用过,但也许我需要进一步调查,因为它可能是中间的地方? 如果我理解正确,stax会保留原始的xml结构,并根据需要处理我要求的部分。这样,原始的解析时间可能很快,但是每次我要求它遍历部分的树,还没有经过,那是在处理进行的时候?



如果你提供一个回答大多数问题的链接,我会接受你的答案你不必直接回答我的问题,如果他们已经在其他地方回答了。)



更新:我用sax重写了它,它解析了avg 2.1 ms。这是一个改善(16%的速度),超过了2.5毫秒,但是,我(等)将会猜到



谢谢

解决方案

假设你什么都不做,但解析文档,不同解析器标准的排名如下: / p>

1。 StAX是最快的




  • 事件报告给你



2。 SAX是下一个




  • 它做的一切StAX都加上内容是自动实现的(元素名称,命名空间,属性,.. 。)



3。 DOM最后一次




  • 它完成了SAX的所有功能,并将该信息呈现为Node的一个实例。



您的用例




somewhat related to: libxml2 from java

yes, this question is rather long-winded - sorry. I kept is as dense as I felt possible. I bolded the questions to make it easier to peek at before reading the whole thing.

Why is sax parsing faster than dom parsing? The only thing I can come up with is that w/ sax you're probably ignoring the majority of the incoming data, and thus not wasting time processing parts of the xml you don't care about. IOW - after parsing w/ SAX, you can't recreate the original input. If you wrote your SAX parser so that it accounted for each and every xml node (and could thus recreate the original), then it wouldn't be any faster than DOM would it?

The reason I'm asking is that I'm trying to parse xml documents more quickly. I need to have access to the entire xml tree AFTER parsing. I am writing a platform for 3rd party services to plug into, so I can't anticipate what parts of the xml document will be needed and which parts won't. I don't even know the structure of the incoming document. This is why I can't use jaxb or sax. Memory footprint isn't an issue for me because the xml documents are small and I only need 1 in memory at a time. It's the time it takes to parse this relatively small xml document that is killing me. I haven't used stax before, but perhaps I need to investigate further because it might be the middle ground? If I understand correctly, stax keeps the original xml structure and processes the parts that I ask for on demand? In this way, the original parse time might be quick, but each time I ask it to traverse part of the tree it hasn't yet traversed, that's when the processing takes place?

If you provide a link that answers most of the questions, I will accept your answer (you don't have to directly answer my questions if they're already answered elsewhere).

update: I rewrote it in sax and it parses documents on avg 2.1 ms. This is an improvement (16% faster) over the 2.5 ms that dom was taking, however it is not the magnitude that I (et al) would've guessed

Thanks

解决方案

Assuming you do nothing but parse the document, the ranking of the different parser standards is as follows:

1. StAX is the fastest

  • The event is reported to you

2. SAX is next

  • It does everything StAX does plus the content is realized automatically (element name, namespace, attributes, ...)

3. DOM is last

  • It does everything SAX does and presents the information as an instance of Node.

Your Use Case

  • If you need to maintain all of the XML, DOM is the standard representation. It integrates cleanly with XSLT transforms (javax.xml.transform), XPath (javax.xml.xpath), and schema validation (javax.xml.validation) APIs. However if performance is key, you may be able to build your own tree structure using StAX faster than a DOM parser could build a DOM.

这篇关于为什么sax解析比dom解析更快?以及stax如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆