如何以流式方式迭代巨大的XML中的节点? [英] How do I iterate over nodes in a huge XML in a streaming fashion?

查看:127
本文介绍了如何以流式方式迭代巨大的XML中的节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的XML文件,如下所示:

I have a gigantic XML file, like this:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
   </book>
   [... one gazillion more entries ...]
</catalog>

我想以流式方式迭代这个文件,这样我就不必加载整个东西进入内存,例如:

I want to iterate over this file in a streaming fashion, so that I never have to load the whole thing into memory, something like:

InputStream stream = new FileInputStream("gigantic-book-list.xml");
String nodeName = "book";
Iterator it = new StreamingXmlIterator(stream, nodeName);
Document bk101 = it.next();
Document bk102 = it.next();

另外,我希望这可以使用不同的XML输入文件,而无需创建特定的对象(例如Book.java)。

Also, I'd like this to work with different XML input files, without having to create specific objects (e.g. Book.java).

@McDowell有一种很有前途的方法,它使用 XMLStreamReader https://stackoverflow.com/a/16799693/13365 上使用StreamFilter ,但是只提取一个节点。

@McDowell has a promising approach that use XMLStreamReader and StreamFilter at https://stackoverflow.com/a/16799693/13365, but that only extracts a single node.

另外, Camel's .tokenizeXML 完全符合我的要求,所以我想我应该查看源代码。

Also, Camel's .tokenizeXML does exactly what I want, so I guess I should look into the source code.

推荐答案

@XmlRootElement
public class Book {
  // TODO: getters/setters
  public String author;
  public String title;
}

假设您想要将数据作为强类型对象处理,您可以将StAX和JAXB组合在一起使用实用程序类型:

Assuming you want to process data as strongly typed objects you can combine StAX and JAXB using utility types:

  class ContentFinder implements StreamFilter {
    private boolean capture = false;

    @Override
    public boolean accept(XMLStreamReader xml) {
      if (xml.isStartElement() && "book".equals(xml.getLocalName())) {
        capture = true;
      } else if (xml.isEndElement() && "book".equals(xml.getLocalName())) {
        capture = false;
        return true;
      }
      return capture;
    }
  }

  class Limiter extends StreamReaderDelegate {
    Limiter(XMLStreamReader xml) {
      super(xml);
    }

    @Override
    public boolean hasNext() throws XMLStreamException {
      return !(getParent().isEndElement()
               && "book".equals(getParent().getLocalName()));
    }
  }

用法:

XMLInputFactory inFactory = XMLInputFactory.newFactory();
XMLStreamReader reader = inFactory.createXMLStreamReader(inputStream);
reader = inFactory.createFilteredReader(reader, new ContentFinder());
Unmarshaller unmar = JAXBContext.newInstance(Book.class)
    .createUnmarshaller();
Transformer tformer = TransformerFactory.newInstance().newTransformer();
while (reader.hasNext()) {
  XMLStreamReader limiter = new Limiter(reader);
  Source src = new StAXSource(limiter);
  DOMResult res = new DOMResult();
  tformer.transform(src, res);
  Book book = (Book) unmar.unmarshal(res.getNode());
  System.out.println(book.title);
}

这篇关于如何以流式方式迭代巨大的XML中的节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆