在JAVA中解析大型XML文档 [英] Parsing large XML documents in JAVA
问题描述
我有以下问题:
我有一个XML文件(大约1GB),并且必须上下迭代(即不顺序;一个后另一个)为了获得所需的数据并对其进行一些操作。最初,我使用了DOM Java包,但显然,在解析XML文件时,JVM会达到其最大堆空间并停止。
I've got an XML file (approx 1GB), and have to iterate up and down (i.e. not sequential; one after the other) in order to get the required data and do some operations on it. Initially, I used the DOM Java package, but obviously, while parsing through the XML file, the JVM reaches its maximum heap space and halted.
为了克服这个问题,我提出的解决方案之一就是找到另一个解析器来迭代XML中的每个元素,然后将它存储在我硬盘上的临时SQLite数据库。因此,通过这种方式,不会超出JVM的堆,并且一旦填满所有数据,我就会忽略XML文件并继续对临时SQLite数据库执行操作。
In order to overcome this problem, one of the solutions I came up with, was to find another parser that iterates each element in the XML and then I store it's contents in a temporary SQLite Database on my Hard disk. Hence, in this way, the JVM's heap is not exceeded, and once all data is filled, I ignore the XML file and continue my operations on the temporary SQLite Database.
还有另外一种方法可以解决我的问题吗?
Is there another way how I can tackle my problem in hand?
推荐答案
SAX(XML的简单API)将在这里为您提供帮助。
SAX (Simple API for XML) will help you here.
与DOM解析器不同,SAX解析器不会创建XML文档的内存
表示,因此速度更快,使用的内存更少
。相反,SAX解析器通过调用回调来通知客户端XML文档
结构,即通过调用
org.xml.sax.helpers.DefaultHandler $上的方法提供给解析器的c $ c>实例。
以下是一个示例实现:
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
DefaultHandler handler = new MyHandler();
parser.parse("file.xml", handler);
在 MyHandler
中的哪个位置定义了操作生成诸如文档/元素的开始/结束之类的事件时会被采用。
Where in MyHandler
you define the actions to be taken when events like start/end of document/element are generated.
class MyHandler extends DefaultHandler {
@Override
public void startDocument() throws SAXException {
}
@Override
public void endDocument() throws SAXException {
}
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
}
// To take specific actions for each chunk of character data (such as
// adding the data to a node or buffer, or printing it to a file).
@Override
public void characters(char ch[], int start, int length)
throws SAXException {
}
}
这篇关于在JAVA中解析大型XML文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!