解析深层嵌套数据的XML [英] Parsing XML for deeply nested data

查看:109
本文介绍了解析深层嵌套数据的XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文件,其结构如下:

I have an XML file that is structured something like this:

<element1>
    <element2>
        <element3>
            <elementIAmInterestedIn attribute="data">
                <element4>
                    <element5>
                        <element6>
                            <otherElementIAmInterestedIn>
                                <data1>text1</data1>
                                <data2>text2</data2>
                                <data3>text3</data3>
                            </otherElementIAmInterestedIn>
                        </element6>
                    </element5>
                </element4>
            </elementIAmInterestedIn>
            <elementIAmInterestedIn attribute="data">
                <element4>
                    <element5>
                        <element6>
                            <otherElementIAmInterestedIn>
                                <data1>text1</data1>
                                <data2>text2</data2>
                                <data3>text3</data3>
                            </otherElementIAmInterestedIn>
                        </element6>
                    </element5>
                </element4>
            </elementIAmInterestedIn>
            <elementIAmInterestedIn attribute="data">
                <element4>
                    <element5>
                        <element6>
                            <otherElementIAmInterestedIn>
                                <data1>text1</data1>
                                <data2>text2</data2>
                                <data3>text3</data3>
                            </otherElementIAmInterestedIn>
                        </element6>
                    </element5>
                </element4>
            </elementIAmInterestedIn>
        </element3>
    </element2>
</element1>

正如你所看到的,我对两个元素感兴趣,第一个元素深深地嵌套在根元素,第二个元素深深嵌套在第一个元素中。文档中有多个(兄弟)elementIAmInterestedIn和otherElementIAmInterestedIn元素。

As you can see, I am interested in two elements, the first of which is deeply nested within the root element, and the second of which is deeply nested within that first element. There are multiple (sibling) elementIAmInterestedIn and otherElementIAmInterestedIn elements in the document.

我想用Java解析这个XML文件,并将来自所有elementIAmInterestedIn和otherElementIAmInterestedIn元素的数据放入无论是数据结构还是Java对象 - 只要它有条理并且我以后可以访问它,对我来说无关紧要。

I want to parse this XML file with Java and put the data from all the elementIAmInterestedIn and otherElementIAmInterestedIn elements into either a data structure or Java objects - it doesn't matter much to me as long as it is organized and I can access it later.

我能够写一种递归的DOM解析器方法,它对XML进行深度优先遍历,以便触及每个元素。我还编写了一个Java类,其中包含代表elementIAmInterestedIn的JAXB注释。然后,在递归方法中,我可以检查何时到达elementIAmInterestedIn并将其解组为JAXB类的实例。这样可以正常工作,除了这样的对象还应该包含多个otherElementIAmInterestedIn。

I'm able to write a recursive DOM parser method that does a depth-first traversal of the XML so that it touches every element. I also wrote a Java class with JAXB annotations that represents elementIAmInterestedIn. Then, in the recursive method, I can check when I get to an elementIAmInterestedIn and unmarshal it into an instance of the JAXB class. This works fine except that such an object should also contain multiple otherElementIAmInterestedIn.

这就是我被困住的地方。如何从otherElementIAmInterestedIn中获取数据并将其分配给JAXB对象?我已经看过@XmlWrapper注释,但这似乎只适用于一层嵌套。另外,我不能使用@XmlPath。

This is where I'm stuck. How can I get the data out of otherElementIAmInterestedIn and assign it to the JAXB object? I've seen the @XmlWrapper annotation, but this seems to only work for one layer of nesting. Also, I cannot use @XmlPath.

也许我应该抓住这个想法并使用一种全新的方法。我真的只是开始使用XML解析,所以也许我忽略了一个更明显的解决方案。你将如何解析这样结构化的XML文档并以有条理的方式存储数据?

Maybe I should scratch that idea and use a whole new approach. I'm really just getting started with XML parsing so perhaps I'm overlooking a more obvious solution. How would you parse an XML document structured like this and store the data in an organized way?

推荐答案

也许你应该使用SAX解析器而不是DOM。当您使用DOM时,您将所有文档加载到内存中,在您的情况下,您只想读取2个字段。这是非常低效的。

Maybe you should use SAX parser instead of DOM. When you use DOM you are loading all the document in memory and in your case you only want to read 2 fields. This is quite inefficient.

使用sax解析器,您将只能读取您感兴趣的节点。这是使用SAX解析的任务的伪代码model:

Using sax parser you'll be able to read only those nodes that you are interested in. Here is a pseudocode for your task using a SAX parsing model:

1)继续阅读节点,直到你得到< elementInterestedIn> node

1) Keep reading nodes until you get <elementInterestedIn> node

2)抓住你班上的那个领域

2) Grab that field in your class

3)继续阅读,直到你得到< otherElementInterestedIn> node

3) Keep on reading until you get <otherElementInterestedIn> node

4)抓住该字段并保存对象。

4) Grab that field too and save the object.

循环来自1到4直到它到达文档的末尾。

Loop from 1 to 4 until it reachs the end of document.

如果您尝试这种方法,我建议您首先阅读本文档以了解SAX解析器的工作原理,它与DOM aproach:如何使用SAX

If you try this aproach, i suggest you first reading this document to understand how SAX parser works, it's very different from DOM aproach: How to Use SAX

这篇关于解析深层嵌套数据的XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆