在JAVA中解析大型XML文档 [英] Parsing large XML documents in JAVA

查看:83
本文介绍了在JAVA中解析大型XML文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下问题:

我有一个XML文件(大约1GB),并且必须上下迭代(即不顺序;一个后另一个)为了获得所需的数据并对其进行一些操作。最初,我使用了DOM Java包,但显然,在解析XML文件时,JVM会达到其最大堆空间并停止。

I've got an XML file (approx 1GB), and have to iterate up and down (i.e. not sequential; one after the other) in order to get the required data and do some operations on it. Initially, I used the DOM Java package, but obviously, while parsing through the XML file, the JVM reaches its maximum heap space and halted.

为了克服这个问题,我提出的解决方案之一就是找到另一个解析器来迭代XML中的每个元素,然后将它存储在我硬盘上的临时SQLite数据库。因此,通过这种方式,不会超出JVM的堆,并且一旦填满所有数据,我就会忽略XML文件并继续对临时SQLite数据库执行操作。

In order to overcome this problem, one of the solutions I came up with, was to find another parser that iterates each element in the XML and then I store it's contents in a temporary SQLite Database on my Hard disk. Hence, in this way, the JVM's heap is not exceeded, and once all data is filled, I ignore the XML file and continue my operations on the temporary SQLite Database.

还有另外一种方法可以解决我的问题吗?

Is there another way how I can tackle my problem in hand?

推荐答案

SAX(XML的简单API)将在这里为您提供帮助。

SAX (Simple API for XML) will help you here.


与DOM解析器不同,SAX解析器不会创建XML文档的内存
表示,因此速度更快,使用的内存更少
。相反,SAX解析器通过调用回调来通知客户端XML文档
结构,即通过调用
org.xml.sax.helpers.DefaultHandler 实例。

以下是一个示例实现:

SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
DefaultHandler handler = new MyHandler();
parser.parse("file.xml", handler);

MyHandler 中的哪个位置定义了操作生成诸如文档/元素的开始/结束之类的事件时会被采用。

Where in MyHandler you define the actions to be taken when events like start/end of document/element are generated.

class MyHandler extends DefaultHandler {

    @Override
    public void startDocument() throws SAXException {
    }

    @Override
    public void endDocument() throws SAXException {
    }

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
    }

    // To take specific actions for each chunk of character data (such as
    // adding the data to a node or buffer, or printing it to a file).
    @Override
    public void characters(char ch[], int start, int length)
            throws SAXException {
    }

}

这篇关于在JAVA中解析大型XML文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆