如何使用SAX正确解析XML? [英] How to properly parse XML with SAX?

查看:115
本文介绍了如何使用SAX正确解析XML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从REST服务接收XML文档,该文档应使用SAX进行解析。请参阅以下由XSD生成的示例。

I am receiving an XML document from a REST service which shall be parsed using SAX. Please see the following example which was generated out of the XSD.

设置解析器不是问题。我的主要问题是 startElement() endElement()方法等中的实际处理。我不明白如何提取我需要的项目并存储它们,因为它们有点嵌套。

Setting up the parser is not a problem. My main problem is the actual processing in the startElement(), endElement() methods etc. I don't understand how to extract the items I need and store them as they are somewhat "nested".

ConnectionList 可以发生一次或两次,并且可以包含任意数量的 Connection 元素,其中包含有关连接的详细信息。基本上,我需要一个所有连接的列表,其日期转移时间。我是否必须为每个元素创建一个类?

The ConnectionList can occur once or twice and may contain any number of Connection elements which -in turn- have details about a connection. Basically, I need a list of all connections with their Date, Transfers and Time. Do I have to create one class per element?

据我所知,我需要执行以下操作:
如果解析器出现了问题。 ..

As far as I got it I somehow need to do the following: If the parser comes across a...


  • ConnectionList :创建新的 ConnectionList object并将其放入 ConnectionList s

  • Connection :创建一个新的Connection对象并将其放入Connections列表中

  • 日期转移时间(仅当父级持续时间时):将节点值存储在当前连接对象

  • ConnectionList: Create new ConnectionList object and put it into a list of ConnectionLists
  • Connection: Create a new Connection object and put it into a list of Connections
  • Date, Transfers, Time (only if parent is Duration): Store the node value in the current Connection object

我真的很感激任何帮助,提示,想法,我是如何实现这一点的。

I'd really appreciate any help, hint, idea, snippet how I can achieve this.

谢谢: - )

Robert

<?xml version="1.0" encoding="UTF-8"?>
<ResC xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Err code="r5E5a1Wm" text="tk-gWYbw" level="E"/>
    <Err code="takVDd34" text="XtvyjmjPuscK" level="E"/>
    <Err code="hQ1-:aDQ" text="YWc5qtY.gkwCeJW2S" level="E"/>
    <ConRes dir="R">
        <Err code="ZfwPC:tj" text="RKKFuLXoM0oOfp3a" level="E"/>
        <Err code="bhDjSJPa" text="BJoHuOMdwzhcddW" level="E"/>
        <Err code="CX-NhK9r" text="j55qy-WiNPXu" level="E"/>
        <ConResCtxt b="1" f="1">0815</ConResCtxt>
        <ConnectionList type="IV">
            <Err code="WI3WX.jo" text="rK3H5jwa-Zfen3" level="E"/>
            <Connection id="ID000">
                <Overview>
                    <Date>b3lcM_Yiyq7dqL9</Date>
                    <Departure>
                        <BasicStop type="NORMAL" index="-1086549314">
                            <Address externalId="t.EdKe93xkqFqLwPzgd-4vHSJemy8"
                                externalStationNr="1332105793" name="fdREYJPu83WV503V8szdCX"
                                x="951177990" y="-1579782776" z="1807457957" type="WGS84"/>
                        </BasicStop>
                    </Departure>
                    <Arrival>
                        <BasicStop type="NORMAL" index="1897526979">
                            <Address externalId="l7h_GTUit6fv" externalStationNr="-1670310329"
                                name="WJznDTzkTvyET51pfr7X" x="-1738098662" y="-170353174"
                                z="-475585957" type="WGS84"/>
                        </BasicStop>
                    </Arrival>
                    <Transfers>dZbgZfDH8j1hb1i</Transfers>
                    <Duration>
                        <Time>00d00:18:00</Time>
                    </Duration>
                    <ServiceDays> </ServiceDays>
                    <Products>
                        <Product cat="qmrN2dShHJp"/>
                        <Product cat="Hg"/>
                        <Product cat="nurxhdl3w.P0x7FRv2J3UoF"/>
                    </Products>
                    <ContextURL url="http://FzgEqiVC/"/>
                </Overview>
            </Connection>
            <Connection id="ID004">
                <Overview>
                    <Date>W5a47DRkc7XDZjhwq_s5Un.</Date>
                    <Departure>
                        <BasicStop type="NORMAL" index="-1014429844">
                            <Address externalId="RMnzjEFOTTdM1oaAUw" externalStationNr="1429101638"
                                name="HF-1" x="1005198487" y="570832676" z="975615566" type="WGS84"
                            />
                        </BasicStop>
                    </Departure>
                    <Arrival>
                        <BasicStop type="NORMAL" index="-58308182">
                            <Address externalId="rVdwdQvAukfj2QcA7b3OSdGOyW"
                                externalStationNr="1142334006" name="g" x="-1791416159"
                                y="-541300941" z="478129823" type="WGS84"/>
                        </BasicStop>
                    </Arrival>
                    <Transfers>GG56XN6zgiJF804mE_N4o</Transfers>
                    <Duration> </Duration>
                    <ServiceDays> </ServiceDays>
                    <Products>
                        <Product cat="fs_Oyoy9NYBai-qaxbty6j9Y7r1St"/>
                        <Product cat="P2CbaSGpC"/>
                        <Product cat="CGZrqSIDM6M4kUlb8_xZ8jRlH4c"/>
                    </Products>
                    <ContextURL url="http://JkRhuXtu/"/>
                </Overview>
            </Connection>
        </ConnectionList>
        <ConnectionList type="IV">
            <Err code="0lFWRY2X" text="KLmdczFRhV" level="E"/>
            <Connection id="ID012">
                <Overview>
                    <Date>t8mn634zjCZsRPyxj_e_-UYMH</Date>
                    <Departure>
                        <BasicStop type="NORMAL" index="-2095085423">
                            <Address externalId="ftKAFG-Uk7x" externalStationNr="1390920810"
                                name="JQrQXOQbm.FLaCMeSiTYjT" x="1970142849" y="-655980297"
                                z="2102464970" type="WGS84"/>
                        </BasicStop>
                    </Departure>
                    <Arrival>
                        <BasicStop type="NORMAL" index="1552118247">
                            <Address externalId="qcBpeuPDRzvSt1o" externalStationNr="-1133118359"
                                name="AJiJOB1t" x="-1422533132" y="-1158953133" z="484831466"
                                type="WGS84"/>
                        </BasicStop>
                    </Arrival>
                    <Transfers>D0MiUwW9nuuM_uykvawg2C07pwHL</Transfers>
                    <Duration> </Duration>
                    <ServiceDays> </ServiceDays>
                    <Products>
                        <Product cat="LpGOZbLDbJm"/>
                        <Product cat="JIv-szQVX2icPb"/>
                        <Product cat="Q7-pthWoOT"/>
                    </Products>
                    <ContextURL url="http://zGWgivvi/"/>
                </Overview>
                <IList>
                    <I header="ze4Wt3hVD-DvjujY6QKae" text="lVwB4RxAHcYq3.F"
                        uriCustom="iVjQJCoU1MVOv2Z9lwarP"/>
                    <I header="z-i.au59soMzXLZCbV" text="PoTP" uriCustom="ksrbwEH6scNR"/>
                    <I header="N" text="jHDA4" uriCustom="ub95811lMIa_495ZbPOuNWL0rRWh"/>
                </IList>
                <CommentList>
                    <Comment id="ID013">
                        <Text lang="EN"> </Text>
                        <Text lang="FR"> </Text>
                        <Text lang="PL"> </Text>
                    </Comment>
                    <Comment id="ID014">
                        <Text lang="DK"> </Text>
                        <Text lang="IT"> </Text>
                        <Text lang="IT"> </Text>
                    </Comment>
                    <Comment id="ID015">
                        <Text lang="MACRO"> </Text>
                        <Text lang="IT"> </Text>
                        <Text lang="EN"> </Text>
                    </Comment>
                </CommentList>
            </Connection>
        </ConnectionList>
    </ConRes>
</ResC>


推荐答案

我找到的最佳方式(目前为止)使用SAX解析XML的方法是在相关的回调中使用堆栈和条件语句。 这是一篇描述它的文章,以及我对它的总结:

The best way I've found (so far) of parsing XML using SAX is to use a stack and conditional statements in the relevant callbacks. Here's an article describing it, and my summary of it:

基本前提是,在解析文档时,您创建对象来存储已分析的数据,随时将它们推送到堆栈中,偷看在堆栈顶部将数据添加到当前元素,并在每个元素的末尾将其弹出堆栈并将其存储在父级中。

The basic premise is that as you parse the document, you create objects to store the parsed data, pushing them onto the stack as you go, peeking at the top of the stack to add data to the current element, and at the end of each element popping it off the stack and storing it in the parent.

效果是你首先解析元素树的深度,然后在每个分支的末尾将它回滚到父代,直到你留下一个包含所有已经准备好的解析数据的对象(例如你的ConnectionList)。用过的。从本质上讲,您最终会得到一系列镜像原始XML结构的对象

The effect is that you parse the tree of elements depth first, and at the end of each branch you roll it back into the parent until you're left with a single object (such as your ConnectionList) that contains all of the parsed data ready to be used. Essentially, you end up with a series of objects that mirror the structure of the original XML

这意味着您需要一些可以将数据存储在同一结构中的数据对象。 XML。复杂元素通常会成为类,而简单元素通常是类中的属性。根元素通常由某种列表表示。

That means you need some data objects that can store the data in the same structure as the XML. Complex elements will normally become classes, while simple elements will normally be attributes within classes. The root element is often represented by a list of some kind.

首先,创建一个堆栈对象,以便在解析数据时保存数据。

To start with, you create a stack object to hold the data as you parse it.

然后,在每个元素的开头使用 localName.equals()方法确定它的类型,创建一个实例适当的类,并将其推入堆栈。如果元素是一个简单元素,您可能会将其作为表示父元素的类中的属性进行建模,并且您将需要一系列标志来告诉解析器是否遇到这样的元素以及它是什么元素因此它可以在 characters()方法中处理。

Then, at the start of each element you identify what type it is using localName.equals() method, create an instance of the appropriate class, and push it into the Stack. If the element is a simple element, you will probably model that as an attribute in the class representing the parent element, and you will need a series of flags that tells the parser if such an element is encountered and what element it is so it can be processed in the characters() method.

使用读取实际数据characters()方法,并再次使用条件逻辑根据标志的值确定如何处理数据。基本上,您可以查看堆栈顶部并使用适当的方法将数据写入对象,并在必要时从文本转换。

The actual data is read using the characters() method, and again you use conditional logic to determine what to do with the data, based on the value of the flag. Essentially, you peek at the top of the stack and use the appropriate method to write the data into the object, converting from text where necessary.

在每个元素的末尾,你弹出堆栈的顶部并再次使用 localName.equals()来确定如何将它存储在它之前的对象中(例如需要调用哪个setter方法)

At the end of each element, you pop the top of the stack and use localName.equals() again to determine how to store it in the object before it (e.g. which setter method needs to be called)

当你到达文档末尾时,你应该已经捕获了文档中的所有数据。

When you reach the end of the document you should have captured all the data in the document.

这篇关于如何使用SAX正确解析XML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆