如何以流式方式迭代巨大的XML中的节点? [英] How do I iterate over nodes in a huge XML in a streaming fashion?
问题描述
我有一个巨大的XML文件,如下所示:
I have a gigantic XML file, like this:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
</book>
[... one gazillion more entries ...]
</catalog>
我想以流式方式迭代这个文件,这样我就不必加载整个东西进入内存,例如:
I want to iterate over this file in a streaming fashion, so that I never have to load the whole thing into memory, something like:
InputStream stream = new FileInputStream("gigantic-book-list.xml");
String nodeName = "book";
Iterator it = new StreamingXmlIterator(stream, nodeName);
Document bk101 = it.next();
Document bk102 = it.next();
另外,我希望这可以使用不同的XML输入文件,而无需创建特定的对象(例如Book.java)。
Also, I'd like this to work with different XML input files, without having to create specific objects (e.g. Book.java).
@McDowell有一种很有前途的方法,它使用 XMLStreamReader
和在
,但是只提取一个节点。
@McDowell has a promising approach that use XMLStreamReader
and StreamFilter
at https://stackoverflow.com/a/16799693/13365, but that only extracts a single node.
另外, Camel's .tokenizeXML 完全符合我的要求,所以我想我应该查看源代码。
Also, Camel's .tokenizeXML does exactly what I want, so I guess I should look into the source code.
推荐答案
@XmlRootElement
public class Book {
// TODO: getters/setters
public String author;
public String title;
}
假设您想要将数据作为强类型对象处理,您可以将StAX和JAXB组合在一起使用实用程序类型:
Assuming you want to process data as strongly typed objects you can combine StAX and JAXB using utility types:
class ContentFinder implements StreamFilter {
private boolean capture = false;
@Override
public boolean accept(XMLStreamReader xml) {
if (xml.isStartElement() && "book".equals(xml.getLocalName())) {
capture = true;
} else if (xml.isEndElement() && "book".equals(xml.getLocalName())) {
capture = false;
return true;
}
return capture;
}
}
class Limiter extends StreamReaderDelegate {
Limiter(XMLStreamReader xml) {
super(xml);
}
@Override
public boolean hasNext() throws XMLStreamException {
return !(getParent().isEndElement()
&& "book".equals(getParent().getLocalName()));
}
}
用法:
XMLInputFactory inFactory = XMLInputFactory.newFactory();
XMLStreamReader reader = inFactory.createXMLStreamReader(inputStream);
reader = inFactory.createFilteredReader(reader, new ContentFinder());
Unmarshaller unmar = JAXBContext.newInstance(Book.class)
.createUnmarshaller();
Transformer tformer = TransformerFactory.newInstance().newTransformer();
while (reader.hasNext()) {
XMLStreamReader limiter = new Limiter(reader);
Source src = new StAXSource(limiter);
DOMResult res = new DOMResult();
tformer.transform(src, res);
Book book = (Book) unmar.unmarshal(res.getNode());
System.out.println(book.title);
}
这篇关于如何以流式方式迭代巨大的XML中的节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!