来自java的libxml2 [英] libxml2 from java

查看:96
本文介绍了来自java的libxml2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与
有些相关最快XML解析器,用于Java中的小型简单文档
,但还有几个细节。



我正在开发一个需要解析的应用程序许多(10亿的),小(约300k)的xml文件。目前的实现是使用xerces-j,1.5 GHz机器上每xml文件大约需要2.5 ms。我想提高这种表现。我遇到这篇文章



http://www.xml.com/pub/a/2007/05/16/xml-parser-benchmarks-part-2.html



声称libxml2可以比任何java解析器快一个数量级的解析。我不知道我是否相信,但引起了我的注意。有没有人试过从jvm使用libxml2?如果是这样,它是否比java dom解析(xerces)更快?我想我仍然需要我的java dom结构,但我猜测,从c结构的dom复制到java-dom应该不用久我必须有java-dom - sax在这种情况下不会帮助我。



更新:我刚刚为libxml2写了一个测试,没有任何更快的速度...让我的c编码能力非常生锈。



更新我把这个问题放大了一点:
为什么是萨克斯解析比dom解析更快?以及stax如何工作?$ $ $ $ $ $ $ $ $



$ b $ p $ class =h2_lin>解决方案

在Java中,StAX JSR- 173 通常被认为是解析XML的最快方法。 StAX有多种实现方式, Woodstox 实施通常被认为是快速的。



为了提高性能,我会避免DOM。你在用XML做什么?如果您最终将其视为对象,则应考虑使用OXM解决方案。标准是JAXB JSR-222 。 JAXB实施例如 MOXy (我是技术主管)甚至可以让你做部分映射将提高性能:




This question is somewhat related to Fastest XML parser for small, simple documents in Java but with a few more specifics.

I'm working on an application which needs to parse many (10s of millions), small (approx. 300k) xml documents. The current implementation is using xerces-j and it takes about 2.5 ms per xml document on a 1.5 GHz machine. I'd like to improve this performance. I came across this article

http://www.xml.com/pub/a/2007/05/16/xml-parser-benchmarks-part-2.html

claiming that libxml2 can parse about an order of magnitude faster than any java parsers. I'm not sure if I believe it, but it caught my attention. Has anyone tried using libxml2 from the jvm? If so, is it faster than java dom parsing (xerces)? I'm thinking I'd still need my java dom structure, but I'm guessing that copying from a c-structured dom into java-dom shouldn't take long. I must have java-dom - sax will not help me in this case.

update: I just wrote a test for libxml2 and it wasn't any faster than xerces... granted my c coding ability is extremely rusty.

update I broadened the question a bit here: why is sax parsing faster than dom parsing ? and how does stax work? and am open to the possibility of ditching dom.

Thanks

解决方案

In Java, StAX JSR-173 is generally considered to be the fastest approach to parsing XML. There are multiple implementations of StAX, the Woodstox implementation is generally regarded as being fast.

To improve performance I would avoid DOM. What are you doing with the XML? If you are ultimately dealing with it as objects, the you should consider an OXM solution. The standard is JAXB JSR-222. JAXB implementations such as MOXy (I'm the tech lead) will even allow you to do a partial mapping which will improve performance:

这篇关于来自java的libxml2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆