巨大的 XML 文件:我是否阅读了“页面"?并且每次都处理它? [英] Huge XML file: Do I read a "page" and process it each time?

查看:35
本文介绍了巨大的 XML 文件:我是否阅读了“页面"?并且每次都处理它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要处理一个巨大的 XML 文件,4G.我使用 dom4j SAX,但编写了我自己的 DefaultElementHandler.代码框架如下:

I need to process a huge XML file, 4G. I used dom4j SAX, but wrote my own DefaultElementHandler. Code framework as below:

SAXParserFactory sf = SAXParserFactory.newInstance();   
SAXParser sax = sf.newSAXParser();   
sax.parse("english.xml", new DefaultElementHandler("page"){   
public void processElement(Element element) { 
// process the element
}
});    

我以为我正在按页面"处理大文件页面".但似乎不是,因为我总是遇到内存不足的错误.我错过了什么重要的事情吗?谢谢.我是 XML 流程的新手.

I thought I was processing the huge file "page" by "page". But it seems not, as I always had the outof memory error. Did I miss anything important? Thanks. I am new to XML process.

推荐答案

你的 DefaultElement 实现在我看来很困惑.看起来一切都在 sBuilder 中堆积,直到找到根元素的末尾,或者更有可能耗尽内存,它才会被清除.

Your DefaultElement implementation looks confused to me. It looks like everything is piling up in sBuilder and it never gets cleared until it finds the end of the root element, or more likely, runs out of memory.

如何读入元素文本取决于你需要解析什么样的xml.每个元素都可以有文本,并且可以穿插子元素.通常存在您在 Web 服务和配置文件中看到的那种 xml,其中所有元素文本都在叶元素中,然后在某些情况下(例如 XHTML)进行穿插处理.

How to read in the element text depends on what kind of xml you need to parse. Each element can have text and it can be interspersed with child elements. Generally there is the kind of xml that you see in web services and config files, where all the element text is in the leaf elements, then there are cases, like XHTML, where the interspersing thing is going on.

如果您的 xml 架构的工作方式是所有文本信息都在叶元素中,那么您可以缓冲从 startElement 开始的文本,并使用 endElement 中累积的文本,然后清除缓冲区.

If the way the schema of your xml works is that all the text information is in the leaf elements, then you can buffer the text you get starting with startElement, and use the accumulated text in endElement, then clear the buffer.

这是一篇关于 SAX 的好文章:http://www.javaworld.com/javaworld/jw-08-2000/jw-0804-sax.html

Here's a good article on SAX: http://www.javaworld.com/javaworld/jw-08-2000/jw-0804-sax.html

这篇关于巨大的 XML 文件:我是否阅读了“页面"?并且每次都处理它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆