如何使用 SAX 解析器解析 XML [英] How to parse XML using the SAX parser

查看:37
本文介绍了如何使用 SAX 解析器解析 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注这个教程.

效果很好,但我希望它返回一个包含所有字符串的数组,而不是包含最后一个元素的单个字符串.

It works great but I would like it to return an array with all the strings instead of a single string with the last element.

任何想法如何做到这一点?

Any ideas how to do this?

推荐答案

所以您想构建一个 XML 解析器来解析这样的 RSS 提要.

So you want to build a XML parser to parse a RSS feed like this one.

<rss version="0.92">
<channel>
    <title>MyTitle</title>
    <link>http://myurl.com</link>
    <description>MyDescription</description>
    <lastBuildDate>SomeDate</lastBuildDate>
    <docs>http://someurl.com</docs>
    <language>SomeLanguage</language>

    <item>
        <title>TitleOne</title>
        <description><![CDATA[Some text.]]></description>
        <link>http://linktoarticle.com</link>
    </item>

    <item>
        <title>TitleTwo</title>
        <description><![CDATA[Some other text.]]></description>
        <link>http://linktoanotherarticle.com</link>
    </item>

</channel>
</rss>

现在您有两个可以使用的 SAX 实现.您可以使用 org.xml.saxandroid.sax 实现.我将在发布一个短手示例后解释两者的优缺点.

Now you have two SAX implementations you can work with. Either you use the org.xml.sax or the android.sax implementation. I'm going to explain the pro's and con's of both after posting a short hander example.

android.sax 实现

让我们从 android.sax 实现开始.

Let's start with the android.sax implementation.

您首先必须使用 RootElementElement 对象定义 XML 结构.

You have first have to define the XML structure using the RootElement and Element objects.

在任何情况下,我都会使用 POJO(Plain Old Java Objects)来保存您的数据.这将是所需的 POJO.

In any case I would work with POJOs (Plain Old Java Objects) which would hold your data. Here would be the POJOs needed.

Channel.java

Channel.java

public class Channel implements Serializable {

    private Items items;
    private String title;
    private String link;
    private String description;
    private String lastBuildDate;
    private String docs;
    private String language;

    public Channel() {
        setItems(null);
        setTitle(null);
        // set every field to null in the constructor
    }

    public void setItems(Items items) {
        this.items = items;
    }

    public Items getItems() {
        return items;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getTitle() {
        return title;
    }
    // rest of the class looks similar so just setters and getters
}

这个类实现了 Serializable 接口,所以你可以把它放入一个 Bundle 并用它做一些事情.

This class implements the Serializable interface so you can put it into a Bundle and do something with it.

现在我们需要一个类来保存我们的物品.在这种情况下,我将扩展 ArrayList 类.

Now we need a class to hold our items. In this case I'm just going to extend the ArrayList class.

Items.java

public class Items extends ArrayList<Item> {

    public Items() {
        super();
    }

}

这就是我们的物品容器.我们现在需要一个类来保存每个项目的数据.

Thats it for our items container. We now need a class to hold the data of every single item.

项目.java

public class Item implements Serializable {

    private String title;
    private String description;
    private String link;

    public Item() {
        setTitle(null);
        setDescription(null);
        setLink(null);
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getTitle() {
        return title;
    }

    // same as above.

}

示例:

public class Example extends DefaultHandler {

    private Channel channel;
    private Items items;
    private Item item;

    public Example() {
        items = new Items();
    }

    public Channel parse(InputStream is) {
        RootElement root = new RootElement("rss");
        Element chanElement = root.getChild("channel");
        Element chanTitle = chanElement.getChild("title");
        Element chanLink = chanElement.getChild("link");
        Element chanDescription = chanElement.getChild("description");
        Element chanLastBuildDate = chanElement.getChild("lastBuildDate");
        Element chanDocs = chanElement.getChild("docs");
        Element chanLanguage = chanElement.getChild("language");

        Element chanItem = chanElement.getChild("item");
        Element itemTitle = chanItem.getChild("title");
        Element itemDescription = chanItem.getChild("description");
        Element itemLink = chanItem.getChild("link");

        chanElement.setStartElementListener(new StartElementListener() {
            public void start(Attributes attributes) {
                channel = new Channel();
            }
        });

        // Listen for the end of a text element and set the text as our
        // channel's title.
        chanTitle.setEndTextElementListener(new EndTextElementListener() {
            public void end(String body) {
                channel.setTitle(body);
            }
        });

        // Same thing happens for the other elements of channel ex.

        // On every <item> tag occurrence we create a new Item object.
        chanItem.setStartElementListener(new StartElementListener() {
            public void start(Attributes attributes) {
                item = new Item();
            }
        });

        // On every </item> tag occurrence we add the current Item object
        // to the Items container.
        chanItem.setEndElementListener(new EndElementListener() {
            public void end() {
                items.add(item);
            }
        });

        itemTitle.setEndTextElementListener(new EndTextElementListener() {
            public void end(String body) {
                item.setTitle(body);
            }
        });

        // and so on

        // here we actually parse the InputStream and return the resulting
        // Channel object.
        try {
            Xml.parse(is, Xml.Encoding.UTF_8, root.getContentHandler());
            return channel;
        } catch (SAXException e) {
            // handle the exception
        } catch (IOException e) {
            // handle the exception
        }

        return null;
    }

}

正如您所见,这是一个非常简单的示例.使用 android.sax SAX 实现的主要优点是您可以定义必须解析的 XML 的结构,然后只需将事件侦听器添加到适当的元素.缺点是代码变得相当重复和臃肿.

Now that was a very quick example as you can see. The major advantage of using the android.sax SAX implementation is that you can define the structure of the XML you have to parse and then just add an event listener to the appropriate elements. The disadvantage is that the code get quite repeating and bloated.

org.xml.sax 实现

org.xml.sax SAX 处理程序实现有点不同.

The org.xml.sax SAX handler implementation is a bit different.

在这里您没有指定或声明您的 XML 结构,而只是侦听事件.使用最广泛的是以下事件:

Here you don't specify or declare you XML structure but just listening for events. The most widely used ones are following events:

  • 文档开始
  • 文档结束
  • 元素开始
  • 元素结束
  • 元素开始和元素结束之间的字符

使用上述 Channel 对象的示例处理程序实现如下所示.

An example handler implementation using the Channel object above looks like this.

示例

public class ExampleHandler extends DefaultHandler {

    private Channel channel;
    private Items items;
    private Item item;
    private boolean inItem = false;

    private StringBuilder content;

    public ExampleHandler() {
        items = new Items();
        content = new StringBuilder();
    }

    public void startElement(String uri, String localName, String qName, 
            Attributes atts) throws SAXException {
        content = new StringBuilder();
        if(localName.equalsIgnoreCase("channel")) {
            channel = new Channel();
        } else if(localName.equalsIgnoreCase("item")) {
            inItem = true;
            item = new Item();
        }
    }

    public void endElement(String uri, String localName, String qName) 
            throws SAXException {
        if(localName.equalsIgnoreCase("title")) {
            if(inItem) {
                item.setTitle(content.toString());
            } else {
                channel.setTitle(content.toString());
            }
        } else if(localName.equalsIgnoreCase("link")) {
            if(inItem) {
                item.setLink(content.toString());
            } else {
                channel.setLink(content.toString());
            }
        } else if(localName.equalsIgnoreCase("description")) {
            if(inItem) {
                item.setDescription(content.toString());
            } else {
                channel.setDescription(content.toString());
            }
        } else if(localName.equalsIgnoreCase("lastBuildDate")) {
            channel.setLastBuildDate(content.toString());
        } else if(localName.equalsIgnoreCase("docs")) {
            channel.setDocs(content.toString());
        } else if(localName.equalsIgnoreCase("language")) {
            channel.setLanguage(content.toString());
        } else if(localName.equalsIgnoreCase("item")) {
            inItem = false;
            items.add(item);
        } else if(localName.equalsIgnoreCase("channel")) {
            channel.setItems(items);
        }
    }

    public void characters(char[] ch, int start, int length) 
            throws SAXException {
        content.append(ch, start, length);
    }

    public void endDocument() throws SAXException {
        // you can do something here for example send
        // the Channel object somewhere or whatever.
    }

}

老实说,我真的不能告诉你这个处理程序实现比 android.sax 有什么真正的优势.然而,我可以告诉你现在应该很明显的缺点.查看 startElement 方法中的 else if 语句.由于我们有标签 </code>、<code>link</code> 和 <code>description</code>,我们必须在我们所在的 XML 结构中跟踪这些标签此时此刻.也就是说,如果我们遇到 <code><item></code> 起始标记,我们将 <code>inItem</code> 标志设置为 <code>true</code> 以确保我们将正确的数据映射到正确的对象,并且在 <code>endElement</code> 方法中,如果我们遇到 <code></item></code> 标记,我们将该标志设置为 <code>false</code>.表示我们已经完成了那个项目标签.<em class="showen"></em></p> <p class="en">Now to be honest I can't really tell you any real advantage of this handler implementation over the <code>android.sax</code> one. I can however tell you the disadvantage which should be pretty obvious by now. Take a look at the else if statement in the <code>startElement</code> method. Due to the fact that we have the tags <code><title></code>, <code>link</code> and <code>description</code> we have to track there in the XML structure we are at the moment. That is if we encounter a <code><item></code> starting tag we set the <code>inItem</code> flag to <code>true</code> to ensure that we map the correct data to the correct object and in the <code>endElement</code> method we set that flag to <code>false</code> if we encounter a <code></item></code> tag. To signalize that we are done with that item tag.</p> <p class="cn">在这个例子中,管理它很容易,但必须解析一个更复杂的结构,在不同级别的重复标签变得棘手.例如,您必须使用 Enums 来设置当前状态和大量 switch/case 语句来检查您的位置,或者更优雅的解决方案是使用标签堆栈的某种标签跟踪器.<em class="showen"></em></p> <p class="en">In this example it is pretty easy to manage that but having to parse a more complex structure with repeating tags in different levels becomes tricky. There you'd have to either use Enums for example to set your current state and a lot of switch/case statemenets to check where you are or a more elegant solution would be some kind of tag tracker using a tag stack.</p> <p>这篇关于如何使用 SAX 解析器解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!</p> </div> <div class="arc-body-main-more"> <span onclick="unlockarc('2605685');">查看全文</span> </div> </div> <div> </div> <div class="wwads-cn wwads-horizontal" data-id="166" style="max-width:100%;border: 4px solid #666;"></div> </div> </article> <div id="arc-ad-2" class="mb-1"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-5038752844014834" crossorigin="anonymous"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-5038752844014834" data-ad-slot="3921941283"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="widget bgwhite radius-1 mb-1 shadow widget-rel"> <h5>相关文章</h5> <ul> <li> <a target="_blank" title="使用SAX解析器解析XML" href="/165764.html"> 使用SAX解析器解析XML; </a> </li> <li> <a target="_blank" title="如何使用xml sax解析器读写大型xml?" href="/2393271.html"> 如何使用xml sax解析器读写大型xml?; </a> </li> <li> <a target="_blank" title="使用SAX解析器解析html" href="/993681.html"> 使用SAX解析器解析html; </a> </li> <li> <a target="_blank" title="差异做使用DOM解析器和SAX解析器Android的XML解析" href="/100446.html"> 差异做使用DOM解析器和SAX解析器Android的XML解析; </a> </li> <li> <a target="_blank" title="如何使用AsyncTask的使用XML SAX解析器?" href="/122422.html"> 如何使用AsyncTask的使用XML SAX解析器?; </a> </li> <li> <a target="_blank" title="轻量级C ++ SAX XML解析器" href="/479540.html"> 轻量级C ++ SAX XML解析器; </a> </li> <li> <a target="_blank" title="使用C#/...中的SAX解析器进行XML解析" href="/1443560.html"> 使用C#/...中的SAX解析器进行XML解析; </a> </li> <li> <a target="_blank" title="使用sax解析器解析和修改xml字符串" href="/963239.html"> 使用sax解析器解析和修改xml字符串; </a> </li> <li> <a target="_blank" title="运行Sax解析器" href="/539217.html"> 运行Sax解析器; </a> </li> <li> <a target="_blank" title="使用反射编写脚本的 XML SAX 解析器" href="/2393284.html"> 使用反射编写脚本的 XML SAX 解析器; </a> </li> <li> <a target="_blank" title="XML SAX解析器不工作 - NullPointerException异常" href="/162030.html"> XML SAX解析器不工作 - NullPointerException异常; </a> </li> <li> <a target="_blank" title="SAX解析器:从XML检索HTML标签" href="/130221.html"> SAX解析器:从XML检索HTML标签; </a> </li> <li> <a target="_blank" title="XML SAX解析器 - 忽略绑定preFIX例外" href="/163301.html"> XML SAX解析器 - 忽略绑定preFIX例外; </a> </li> <li> <a target="_blank" title="如何采用了android SAX解析器解析与XML命名空间" href="/151846.html"> 如何采用了android SAX解析器解析与XML命名空间; </a> </li> <li> <a target="_blank" title="Android的SAX解析器" href="/166003.html"> Android的SAX解析器; </a> </li> <li> <a target="_blank" title="Android的 - 解析使用SAX解析器大文件" href="/130181.html"> Android的 - 解析使用SAX解析器大文件; </a> </li> <li> <a target="_blank" title="使用SAX解析器解析自关闭XML标记时出现问题" href="/2282747.html"> 使用SAX解析器解析自关闭XML标记时出现问题; </a> </li> <li> <a target="_blank" title="解析使用SAX解析器大的XML文件(跳过某些行/标签)" href="/125289.html"> 解析使用SAX解析器大的XML文件(跳过某些行/标签); </a> </li> <li> <a target="_blank" title="XML SAX解析器之间的差异,拉解析器和放大器; DOM解析器的android" href="/114405.html"> XML SAX解析器之间的差异,拉解析器和放大器; DOM解析器的android; </a> </li> <li> <a target="_blank" title="如何使用DOM解析器解析XML" href="/95789.html"> 如何使用DOM解析器解析XML; </a> </li> <li> <a target="_blank" title="帮助Java SAX解析器理解错误的xml" href="/970880.html"> 帮助Java SAX解析器理解错误的xml; </a> </li> <li> <a target="_blank" title="选择具有SAX解析器特定的XML标签" href="/165382.html"> 选择具有SAX解析器特定的XML标签; </a> </li> <li> <a target="_blank" title="如何使用SAX解析器解析Android中的HTML内容" href="/121119.html"> 如何使用SAX解析器解析Android中的HTML内容; </a> </li> <li> <a target="_blank" title="创建SAX解析器时出错" href="/2280650.html"> 创建SAX解析器时出错; </a> </li> <li> <a target="_blank" title="SAX解析器:忽略特殊字符" href="/963353.html"> SAX解析器:忽略特殊字符; </a> </li> </ul> </div> <div class="mb-1"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-5038752844014834" crossorigin="anonymous"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-5038752844014834" data-ad-slot="3921941283"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> <div class="side"> <div class="widget widget-side bgwhite mb-1 shadow"> <h5>Java开发最新文章</h5> <ul> <li> <a target="_blank" title="Tomcat 404错误:原始服务器没有找到目标资源的当前表示,或者不愿意透露该目录的存在" href="/664384.html"> Tomcat 404错误:原始服务器没有找到目标资源的当前表示,或者不愿意透露该目录的存在; </a> </li> <li> <a target="_blank" title="由于缺少ServletWebServerFactory bean,无法启动ServletWebServerApplicationContext" href="/908134.html"> 由于缺少ServletWebServerFactory bean,无法启动ServletWebServerApplicationContext; </a> </li> <li> <a target="_blank" title="无法反序列化的java.util.ArrayList实例出来VALUE_STRING的" href="/231593.html"> 无法反序列化的java.util.ArrayList实例出来VALUE_STRING的; </a> </li> <li> <a target="_blank" title="什么是AssertionError?在这种情况下,我应该从我自己的代码中抛出?" href="/741560.html"> 什么是AssertionError?在这种情况下,我应该从我自己的代码中抛出?; </a> </li> <li> <a target="_blank" title="JSON反序列化投掷例外 - 无法反序列化的java.util.ArrayList实例出来START_OBJECT令牌" href="/232414.html"> JSON反序列化投掷例外 - 无法反序列化的java.util.ArrayList实例出来START_OBJECT令牌; </a> </li> <li> <a target="_blank" title="Maven构建错误 - 无法执行目标org.apache.maven.plugins:Maven的组装插件:2.5.5" href="/345036.html"> Maven构建错误 - 无法执行目标org.apache.maven.plugins:Maven的组装插件:2.5.5; </a> </li> <li> <a target="_blank" title="正确使用Optional.ifPresent()" href="/998833.html"> 正确使用Optional.ifPresent(); </a> </li> <li> <a target="_blank" title="获取异常(org.apache.poi.openxml4j.exception - 没有内容类型[M1.13])阅读使用Apache POI XLSX文件时?" href="/219241.html"> 获取异常(org.apache.poi.openxml4j.exception - 没有内容类型[M1.13])阅读使用Apache POI XLSX文件时?; </a> </li> <li> <a target="_blank" title="SpringBoot - 制作jar文件 - 在META-INF / spring.factories中找不到自动配置类" href="/907745.html"> SpringBoot - 制作jar文件 - 在META-INF / spring.factories中找不到自动配置类; </a> </li> <li> <a target="_blank" title="HTTP状态404 - 请求的资源(/)不可用" href="/659888.html"> HTTP状态404 - 请求的资源(/)不可用; </a> </li> </ul> </div> <div class="widget widget-side bgwhite mb-1 shadow"> <h5> 热门教程 </h5> <ul> <li> <a target="_blank" title="Java教程" href="/OnLineTutorial/java/index.html"> Java教程 </a> </li> <li> <a target="_blank" title="Apache ANT 教程" href="/OnLineTutorial/ant/index.html"> Apache ANT 教程 </a> </li> <li> <a target="_blank" title="Kali Linux教程" href="/OnLineTutorial/kali_linux/index.html"> Kali Linux教程 </a> </li> <li> <a target="_blank" title="JavaScript教程" href="/OnLineTutorial/javascript/index.html"> JavaScript教程 </a> </li> <li> <a target="_blank" title="JavaFx教程" href="/OnLineTutorial/javafx/index.html"> JavaFx教程 </a> </li> <li> <a target="_blank" title="MFC 教程" href="/OnLineTutorial/mfc/index.html"> MFC 教程 </a> </li> <li> <a target="_blank" title="Apache HTTP客户端教程" href="/OnLineTutorial/apache_httpclient/index.html"> Apache HTTP客户端教程 </a> </li> <li> <a target="_blank" title="Microsoft Visio 教程" href="/OnLineTutorial/microsoft_visio/index.html"> Microsoft Visio 教程 </a> </li> </ul> </div> <div class="widget widget-side bgwhite mb-1 shadow"> <h5> 热门工具 </h5> <ul> <li> <a target="_blank" title="Java 在线工具" href="/Onlinetools/details/4"> Java 在线工具 </a> </li> <li> <a target="_blank" title="C(GCC) 在线工具" href="/Onlinetools/details/6"> C(GCC) 在线工具 </a> </li> <li> <a target="_blank" title="PHP 在线工具" href="/Onlinetools/details/8"> PHP 在线工具 </a> </li> <li> <a target="_blank" title="C# 在线工具" href="/Onlinetools/details/1"> C# 在线工具 </a> </li> <li> <a target="_blank" title="Python 在线工具" href="/Onlinetools/details/5"> Python 在线工具 </a> </li> <li> <a target="_blank" title="MySQL 在线工具" href="/Onlinetools/Dbdetails/33"> MySQL 在线工具 </a> </li> <li> <a target="_blank" title="VB.NET 在线工具" href="/Onlinetools/details/2"> VB.NET 在线工具 </a> </li> <li> <a target="_blank" title="Lua 在线工具" href="/Onlinetools/details/14"> Lua 在线工具 </a> </li> <li> <a target="_blank" title="Oracle 在线工具" href="/Onlinetools/Dbdetails/35"> Oracle 在线工具 </a> </li> <li> <a target="_blank" title="C++(GCC) 在线工具" href="/Onlinetools/details/7"> C++(GCC) 在线工具 </a> </li> <li> <a target="_blank" title="Go 在线工具" href="/Onlinetools/details/20"> Go 在线工具 </a> </li> <li> <a target="_blank" title="Fortran 在线工具" href="/Onlinetools/details/45"> Fortran 在线工具 </a> </li> </ul> </div> </div> </div> <script type="text/javascript">var eskeys = '如何,使用,sax,解析器,解析,xml'; var cat = 'cc';';//java</script> </div> <div id="pop" onclick="pophide();"> <div id="pop_body" onclick="event.stopPropagation();"> <h6 class="flex flex101"> 登录 <span onclick="pophide();">关闭</span> </h6> <div class="pd-1"> <div class="wxtip center"> <span>扫码关注<em>1秒</em>登录</span> </div> <div class="center"> <img id="qr" src="https://huajiakeji.com/Content/Images/qrydx.jpg" alt="" style="width:150px;height:150px;" /> </div> <div style="margin-top:10px;display:flex;justify-content: center;"> <input type="text" placeholder="输入验证码" id="txtcode" autocomplete="off" /> <input id="btngo" type="button" onclick="chk()" value="GO" /> </div> <div class="center" style="margin: 4px; font-size: .8rem; color: #f60;"> 发送“验证码”获取 <em style="padding: 0 .5rem;">|</em> <span style="color: #01a05c;">15天全站免登陆</span> </div> <div id="chkinfo" class="tip"></div> </div> </div> </div> <script type="text/javascript" src="https://lib.sinaapp.com/js/jquery/1.9.1/jquery-1.9.1.min.js"></script> <script type="text/javascript" src="https://cdn.bootcss.com/jquery-cookie/1.4.1/jquery.cookie.min.js"></script> <script type="text/javascript" src="https://img01.yuandaxia.cn/Scripts/highlight.min.js"></script> <script type="text/javascript" src="https://img01.yuandaxia.cn/Scripts/base.js?v=0.22"></script> <script type="text/javascript" src="https://img01.yuandaxia.cn/Scripts/tui.js?v=0.11"></script> <footer class="footer"> <div class="container"> <div class="flink mb-1"> 友情链接: <a href="https://www.it1352.com/" target="_blank">IT屋</a> <a href="https://huajiakeji.com/" target="_blank">Chrome插件</a> <a href="https://www.cnplugins.com/" target="_blank">谷歌浏览器插件</a> </div> <section class="copyright-section"> <a href="https://www.it1352.com" title="IT屋-程序员软件开发技术分享社区">IT屋</a> ©2016-2022 <a href="http://www.beian.miit.gov.cn/" target="_blank">琼ICP备2021000895号-1</a> <a href="/sitemap.html" target="_blank" title="站点地图">站点地图</a> <a href="/Home/Tags" target="_blank" title="站点标签">站点标签</a> <a target="_blank" alt="sitemap" href="/sitemap.xml">SiteMap</a> <a href="/1155981.html" title="IT屋-免责申明"><免责申明></a> 本站内容来源互联网,如果侵犯您的权益请联系我们删除. </section> <!--统计代码--> <script type="text/javascript"> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?0c3a090f7b3c4ad458ac1296cb5cc779"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> <script type="text/javascript"> (function () { var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </div> </footer> </body> </html>