如何在Java中高效地读取包含大量小项目的大型XML文件? [英] How to read large XML file consisting of large number of small items efficiently in Java?
问题描述
我有一个较大的XML文件,其中包含相对固定大小的项目,即
I have a large XML file that consists of relatively fixed size items i.e.
<rootElem>
<item>...</item>
<item>...</item>
<item>...</item>
<rootElem>
项元素相对较浅,通常相当小(<100 KB),但可能有很多(数十万)。这些项目是完全独立的。
The item elements are relatively shallow and typically rather small ( <100 KB), but there may be a lot of them (hundreds of thousands). The items are completely independent of each other.
如何在Java中有效地处理该文件?我无法以DOM形式读取整个文件,而且我不喜欢使用SAX,因为代码变得相当复杂。我想避免将文件拆分成较小的部分。
How could I process the file efficiently in Java? I can't read the whole file in as DOM, and I don't like to use SAX because the code gets rather complex. I'd like to avoid splitting the file to smaller pieces.
如果我可以将每个 项目 元素(一次一个)作为单独的DOM文档,我可以使用JAXB等工具进行处理。基本上我只想在所有项目中循环一次。
Optimal would be if I could obtain each item element, one at a time, as a separate DOM document, that I could process using tools like JAXB. Basically I just want to loop once over all the items.
我认为这是一个相当常见的问题。
I would think that this is a rather common problem.
推荐答案
Java 6有一个 StAX支持。它采用像SAX这样的流处理方式,但是使用了一种基于拉式的方法,可以实现更简单的处理代码。
Java 6 has a StAX support. It perfroms a stream processing like SAX, but uses a pull-based approach which leads to the simplier handling code.
这篇关于如何在Java中高效地读取包含大量小项目的大型XML文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!