Java / XML:良好的“基于流”的替代JAXB? [英] Java/XML: Good "Stream-based" Alternative to JAXB?

查看:92
本文介绍了Java / XML:良好的“基于流”的替代JAXB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

JAXB使得使用XML变得更加容易,但我目前面临一个很大的问题,即我必须处理的文件对于JAXB所做的内存解组来说太大了。每个文档的数据最多可达4GB。

JAXB makes working with XML so much easier, but I have currently a big problem, that the documents I have to process are too large for an in memory unmarshalling that JAXB does. The data can be up to 4GB per document.

我必须处理的数据结构非常简单和平坦:具有根元素和数百万个元素......

The datastructure I will have to process is very simple and flat: With a root element and millions of "elements"…

<root>
<element>
<sub>foo</sub>
</element>
<element>
<sub>foo</sub>
</element>
</root>

可能有以下问题:


  1. JAXB是否可能以某种方式支持以流式方式进行解组,这不需要在内存中构建整个ob​​jecttree,而是逐个元素地为元素提供某种Iterator? (也许我只是错过了某种方式......)

  1. Does JAXB maybe somehow support unmarshalling in a "streambased" way, that does not require to build the whole objecttree in memory but rather gives me some kind of "Iterator" to the elements, element by element? (Maybe I just missed that somehow…)

如果不是你的建议,那么你可以用
来获得一个好的替代品。 平坦的学习曲线,理想情况下非常类似于JAXB
b。非常重要:理想情况下,使用从XSD文件生成unarshaller代码的可能性/工具或带注释的Java类

If not what are your proposals for an good alternative with a a. "flat learningcurve, ideally very similar to JAXB b. AND VERY IMPORTANT: Ideally with the possibility / tool for the generation of the unarshaller code from an XSD file OR annotated Java Class

3.(我已经搜索了SO和那些最终出现在我的关注列表中的库(没有比较接近)是Apache XML Beans和Xstream ...
其他哪些库可能更好用于此目的和什么是缺点,adavangaes ...

3.(I have searched SO and those to library that ended up on my "watchlist" (without comparing them closer) were Apache XML Beans and Xstream… What other libraries are maybe even better for the purpose and what are the disadvantages, adavangaes…

非常感谢!!!
Jan

Thank you very much!!! Jan

推荐答案

这些都是错误的方法,因为它们基本上都是bean映射器。也就是说,将XML文档转换为Java Bean。为了做到这一点,你几乎不得不吮吸完整的东西进入机器。

Those are all the wrong approach, since they're all basically "bean" mapper. That is, convert XML document to a Java Bean. In order to do that, you pretty much have to suck the whole thing in to the machine.

现在,显然,有更好的方法可以完成。例如,实际上并不需要加载整个XML DOM为了映射一个bean,但我实际上并不知道JAXB等人是如何进行序列化的。我怀疑它他们不打扰DOM,而是直接在XML流式传输时填充bean字段。这将节省整体处理,但您仍然将整个文档作为一组类实例放在RAM中。

Now, obviously, there are "better" ways it could be done. For example, it's not actually necessary to load the entire XML DOM in order to map a bean, but I don't know actually HOW JAXB et al perform their serialisation. I suspect that they don't bother with a DOM, but rather populate bean fields directly as the XML is streamed by. This will save overall processing, but you still end up with the entire document in RAM as a set of class instances.

现在,如果您只想要一点点在XML文档中,您可能需要考虑StAX实现。这是一个类似于DOM的接口,位于流式解析器之上。虽然最后这可能不是很好,因为我认为这些工作通过尽可能多地流式传输文件,这意味着如果你需要在前面的东西,你赢了因为它可以把剩下的东西扔掉。但是如果你最终想要一些东西,我认为它保留了大部分内容。那也不好。

Now, if you just want a little bit of the XML document, you might want to consider a StAX implementation. This is a DOM-like interface on top of a streaming parser. Although, in the end this may not be very good as I think these work by streaming as much of the document as necessary, which means if you need something at the front, you win because it can throw the rest away. But if you want something at the end, I think it retains most of what it's seen to that point. That's not good either.

这让你有了很好的'ol SAX。众所周知,通过SAX,你会得到蓝调。因为它是如此原始的层。但它是最有效的,并且给你最大的控制权。

Which leaves you with good 'ol SAX. And everyone knows, with SAX, you get the blues. Because it's such a primitive layer. But it's the most efficient, and gives you the most control.

XSD映射将很困难,因为映射框架的优点在于它们知道该怎么做包含所有元素(它们创建类实例,并将它们填充到父类中)。你想做一些与众不同的事,在任意点都是随意的。

The XSD mapping will be difficult, simply because the beauty of the mapping frameworks is that they know what to do with all of the elements (they create class instances, and stuff them in to parent classes). You want to do something different, something arbitrary at arbitrary points.

SAX并不是那么糟糕,我写了一个很好的小粗略映射器,它可以让你做什么你想做的,除了你必须手工编写它而不是使用XSD,它是在Obj-C,而不是Java。但基本上它走了XML流,并根据路径名查找类的setter。这取代了您使用SAX代码获得的元素回调中典型的巨大if element =名称...链。

SAX isn't that bad, I wrote a nice little crude mapper that kind of allows you to do what you want to do, save you have to hand code it rather than use an XSD, and it's in Obj-C, not Java. But basically it walked the XML stream and looked for setters on classes based on the path name. This replaced the typical huge "if element = "name"..." chains in the element callback that you get with SAX code.

不是您要找的答案,我敢肯定......如果我被证明是错的,请高兴。

Not the answer you were looking for, I'm sure...be happy if I'm proved wrong.

这篇关于Java / XML:良好的“基于流”的替代JAXB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆