如何使用 SAX 解析器解析 XML [英] How to parse XML using the SAX parser
问题描述
我正在关注这个教程.
效果很好,但我希望它返回一个包含所有字符串的数组,而不是包含最后一个元素的单个字符串.
It works great but I would like it to return an array with all the strings instead of a single string with the last element.
任何想法如何做到这一点?
Any ideas how to do this?
推荐答案
所以您想构建一个 XML 解析器来解析这样的 RSS 提要.
So you want to build a XML parser to parse a RSS feed like this one.
<rss version="0.92">
<channel>
<title>MyTitle</title>
<link>http://myurl.com</link>
<description>MyDescription</description>
<lastBuildDate>SomeDate</lastBuildDate>
<docs>http://someurl.com</docs>
<language>SomeLanguage</language>
<item>
<title>TitleOne</title>
<description><![CDATA[Some text.]]></description>
<link>http://linktoarticle.com</link>
</item>
<item>
<title>TitleTwo</title>
<description><![CDATA[Some other text.]]></description>
<link>http://linktoanotherarticle.com</link>
</item>
</channel>
</rss>
现在您有两个可以使用的 SAX 实现.您可以使用 org.xml.sax
或 android.sax
实现.我将在发布一个短手示例后解释两者的优缺点.
Now you have two SAX implementations you can work with. Either you use the org.xml.sax
or the android.sax
implementation. I'm going to explain the pro's and con's of both after posting a short hander example.
android.sax 实现
让我们从 android.sax
实现开始.
Let's start with the android.sax
implementation.
您首先必须使用 RootElement
和 Element
对象定义 XML 结构.
You have first have to define the XML structure using the RootElement
and Element
objects.
在任何情况下,我都会使用 POJO(Plain Old Java Objects)来保存您的数据.这将是所需的 POJO.
In any case I would work with POJOs (Plain Old Java Objects) which would hold your data. Here would be the POJOs needed.
Channel.java
Channel.java
public class Channel implements Serializable {
private Items items;
private String title;
private String link;
private String description;
private String lastBuildDate;
private String docs;
private String language;
public Channel() {
setItems(null);
setTitle(null);
// set every field to null in the constructor
}
public void setItems(Items items) {
this.items = items;
}
public Items getItems() {
return items;
}
public void setTitle(String title) {
this.title = title;
}
public String getTitle() {
return title;
}
// rest of the class looks similar so just setters and getters
}
这个类实现了 Serializable
接口,所以你可以把它放入一个 Bundle
并用它做一些事情.
This class implements the Serializable
interface so you can put it into a Bundle
and do something with it.
现在我们需要一个类来保存我们的物品.在这种情况下,我将扩展 ArrayList
类.
Now we need a class to hold our items. In this case I'm just going to extend the ArrayList
class.
Items.java
public class Items extends ArrayList<Item> {
public Items() {
super();
}
}
这就是我们的物品容器.我们现在需要一个类来保存每个项目的数据.
Thats it for our items container. We now need a class to hold the data of every single item.
项目.java
public class Item implements Serializable {
private String title;
private String description;
private String link;
public Item() {
setTitle(null);
setDescription(null);
setLink(null);
}
public void setTitle(String title) {
this.title = title;
}
public String getTitle() {
return title;
}
// same as above.
}
示例:
public class Example extends DefaultHandler {
private Channel channel;
private Items items;
private Item item;
public Example() {
items = new Items();
}
public Channel parse(InputStream is) {
RootElement root = new RootElement("rss");
Element chanElement = root.getChild("channel");
Element chanTitle = chanElement.getChild("title");
Element chanLink = chanElement.getChild("link");
Element chanDescription = chanElement.getChild("description");
Element chanLastBuildDate = chanElement.getChild("lastBuildDate");
Element chanDocs = chanElement.getChild("docs");
Element chanLanguage = chanElement.getChild("language");
Element chanItem = chanElement.getChild("item");
Element itemTitle = chanItem.getChild("title");
Element itemDescription = chanItem.getChild("description");
Element itemLink = chanItem.getChild("link");
chanElement.setStartElementListener(new StartElementListener() {
public void start(Attributes attributes) {
channel = new Channel();
}
});
// Listen for the end of a text element and set the text as our
// channel's title.
chanTitle.setEndTextElementListener(new EndTextElementListener() {
public void end(String body) {
channel.setTitle(body);
}
});
// Same thing happens for the other elements of channel ex.
// On every <item> tag occurrence we create a new Item object.
chanItem.setStartElementListener(new StartElementListener() {
public void start(Attributes attributes) {
item = new Item();
}
});
// On every </item> tag occurrence we add the current Item object
// to the Items container.
chanItem.setEndElementListener(new EndElementListener() {
public void end() {
items.add(item);
}
});
itemTitle.setEndTextElementListener(new EndTextElementListener() {
public void end(String body) {
item.setTitle(body);
}
});
// and so on
// here we actually parse the InputStream and return the resulting
// Channel object.
try {
Xml.parse(is, Xml.Encoding.UTF_8, root.getContentHandler());
return channel;
} catch (SAXException e) {
// handle the exception
} catch (IOException e) {
// handle the exception
}
return null;
}
}
正如您所见,这是一个非常简单的示例.使用 android.sax
SAX 实现的主要优点是您可以定义必须解析的 XML 的结构,然后只需将事件侦听器添加到适当的元素.缺点是代码变得相当重复和臃肿.
Now that was a very quick example as you can see. The major advantage of using the android.sax
SAX implementation is that you can define the structure of the XML you have to parse and then just add an event listener to the appropriate elements. The disadvantage is that the code get quite repeating and bloated.
org.xml.sax 实现
org.xml.sax
SAX 处理程序实现有点不同.
The org.xml.sax
SAX handler implementation is a bit different.
在这里您没有指定或声明您的 XML 结构,而只是侦听事件.使用最广泛的是以下事件:
Here you don't specify or declare you XML structure but just listening for events. The most widely used ones are following events:
- 文档开始
- 文档结束
- 元素开始
- 元素结束
- 元素开始和元素结束之间的字符
使用上述 Channel 对象的示例处理程序实现如下所示.
An example handler implementation using the Channel object above looks like this.
示例
public class ExampleHandler extends DefaultHandler {
private Channel channel;
private Items items;
private Item item;
private boolean inItem = false;
private StringBuilder content;
public ExampleHandler() {
items = new Items();
content = new StringBuilder();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
content = new StringBuilder();
if(localName.equalsIgnoreCase("channel")) {
channel = new Channel();
} else if(localName.equalsIgnoreCase("item")) {
inItem = true;
item = new Item();
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(localName.equalsIgnoreCase("title")) {
if(inItem) {
item.setTitle(content.toString());
} else {
channel.setTitle(content.toString());
}
} else if(localName.equalsIgnoreCase("link")) {
if(inItem) {
item.setLink(content.toString());
} else {
channel.setLink(content.toString());
}
} else if(localName.equalsIgnoreCase("description")) {
if(inItem) {
item.setDescription(content.toString());
} else {
channel.setDescription(content.toString());
}
} else if(localName.equalsIgnoreCase("lastBuildDate")) {
channel.setLastBuildDate(content.toString());
} else if(localName.equalsIgnoreCase("docs")) {
channel.setDocs(content.toString());
} else if(localName.equalsIgnoreCase("language")) {
channel.setLanguage(content.toString());
} else if(localName.equalsIgnoreCase("item")) {
inItem = false;
items.add(item);
} else if(localName.equalsIgnoreCase("channel")) {
channel.setItems(items);
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
content.append(ch, start, length);
}
public void endDocument() throws SAXException {
// you can do something here for example send
// the Channel object somewhere or whatever.
}
}
老实说,我真的不能告诉你这个处理程序实现比 android.sax
有什么真正的优势.然而,我可以告诉你现在应该很明显的缺点.查看 startElement
方法中的 else if 语句.由于我们有标签
、link
和 description
,我们必须在我们所在的 XML 结构中跟踪这些标签此时此刻.也就是说,如果我们遇到
起始标记,我们将 inItem
标志设置为 true
以确保我们将正确的数据映射到正确的对象,并且在 endElement
方法中,如果我们遇到 </item>
标记,我们将该标志设置为 false
.表示我们已经完成了那个项目标签.
Now to be honest I can't really tell you any real advantage of this handler implementation over the android.sax
one. I can however tell you the disadvantage which should be pretty obvious by now. Take a look at the else if statement in the startElement
method. Due to the fact that we have the tags <title>
, link
and description
we have to track there in the XML structure we are at the moment. That is if we encounter a <item>
starting tag we set the inItem
flag to true
to ensure that we map the correct data to the correct object and in the endElement
method we set that flag to false
if we encounter a </item>
tag. To signalize that we are done with that item tag.
在这个例子中,管理它很容易,但必须解析一个更复杂的结构,在不同级别的重复标签变得棘手.例如,您必须使用 Enums 来设置当前状态和大量 switch/case 语句来检查您的位置,或者更优雅的解决方案是使用标签堆栈的某种标签跟踪器.
In this example it is pretty easy to manage that but having to parse a more complex structure with repeating tags in different levels becomes tricky. There you'd have to either use Enums for example to set your current state and a lot of switch/case statemenets to check where you are or a more elegant solution would be some kind of tag tracker using a tag stack.
这篇关于如何使用 SAX 解析器解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!