Java SAX解析 [英] Java SAX Parsing

查看:63
本文介绍了Java SAX解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析一个XML流。因为我只需要做一次并构建我的java对象,所以SAX看起来是自然的选择。我正在扩展DefaultHandler并实现startElement,endElement和characters方法,在我的类中有成员保存当前读取值(在字符方法中使用)。

There's an XML stream which I need to parse. Since I only need to do it once and build my java objects, SAX looks like the natural choice. I'm extending DefaultHandler and implementing the startElement, endElement and characters methods, having members in my class where I save the current read value (taken in the characters method).

我做我需要的东西没有问题,但我的代码变得非常复杂,我确信没有理由这样做,我可以做不同的事情。
我的XML结构是这样的:

I have no problem doing what I need, but my code got quite complex and I'm sure there's no reason for that and that I can do things differently. The structure of my XML is something like this:

<players>
  <player>
    <id></id>
    <name></name>
    <teams total="2">
      <team>
        <id></id>
        <name></name>
        <start-date>
          <year>2009</year>
          <month>9</month>
        </start-date>
        <is-current>true</is-current>
      </team>
      <team>
        <id></id>
        <name></name>
        <start-date>
          <year>2007</year>
          <month>11</month>
        </start-date>
        <end-date>
          <year>2009</year>
          <month>7</month>
        </end-date>
      </team>
    </teams>
  </player>
</players>

当我意识到在文件的多个区域中使用相同的标记名称时,我的问题就出现了。例如,玩家和团队都存在id和name。我想创建我的java类Player和Team的实例。在解析时,我保留了布尔标志,告诉我我是否在团队部分,以便在endElement中我知道该名称是团队名称,而不是玩家的名字等等。

My problem started when I realized that the same tag names are used in several areas of the file. For example, id and name exist for both a player and a team. I want to create instances of my java classes Player and Team. While parsing, I kept boolean flags telling me whether I'm in the teams section so that in the endElement I will know that the name is a team's name, not a player's name and so on.

以下是我的代码的样子:

Here's how my code looks like:

public class MyParser extends DefaultHandler {

    private String currentValue;
    private boolean inTeamsSection = false;
    private Player player;
    private Team team;
    private List<Team> teams;

    public void characters(char[] ch, int start, int length) throws SAXException {
        currentValue = new String(ch, start, length);
    }

    public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
        if(name.equals("player")){
            player = new Player();
        }
        if (name.equals("teams")) {
            inTeamsSection = true;
            teams = new ArrayList<Team>();
        }
        if (name.equals("team")){
            team = new Team();
        }
    }   

    public void endElement(String uri, String localName, String name) throws SAXException {
        if (name.equals("id")) {
            if(inTeamsSection){
                team.setId(currentValue);
            }
            else{
                player.setId(currentValue);
            }
        }
        if (name.equals("name")){
            if(inTeamsSection){
                team.setName(currentValue);
            }
            else{
                player.setName(currentValue);
            }
        }
        if (name.equals("team")){
            teams.add(team);
        }
        if (name.equals("teams")){
            player.setTeams(teams);
            inTeamsSection = false;
        }
    }
}

因为在我的真实场景中我除了团队之外,还有更多节点给玩家,这些节点也有像name和id这样的标签,我发现自己搞乱了几个类似于inTeamsSection的布尔值,而且我的endElement方法在很多条件下变得冗长而复杂。

Since in my real scenario I have more nodes to a player in addition to the teams and those nodes also have tags like name and id, I found myself messed up with several booleans similar to the inTeamsSection and my endElement method becomes long and complex with many conditions.

我应该采取哪些不同的做法?我怎么知道名称标签属于哪个?

What should I do differently? How can I know what a name tag, for instance, belongs to?

谢谢!

推荐答案

编写SAX解析器时有一个巧妙的技巧:允许在解析时更改XMLReader的
ContentHandler 。这允许将不同元素的
解析逻辑分成多个类,这使得
解析更加模块化和可重用。当一个处理程序看到它的结束元素时,
切换回其父元素。您实施的处理程序数量将留给
。代码如下所示:

There is one neat trick when writing a SAX parser: It is allowed to change the ContentHandler of a XMLReader while parsing. This allows to separate the parsing logic for different elements into multiple classes, which makes the parsing more modular and reusable. When one handler sees its end element it switches back to its parent. How many handlers you implement would be left to you. The code would look like this:

public class RootHandler extends DefaultHandler {
    private XMLReader reader;
    private List<Team> teams;

    public RootHandler(XMLReader reader) {
        this.reader = reader;
        this.teams = new LinkedList<Team>();
    }

    public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
        if (name.equals("team")) {
            // Switch handler to parse the team element
            reader.setContentHandler(new TeamHandler(reader, this));
        }
    }
}

public class TeamHandler extends DefaultHandler {
    private XMLReader reader;
    private RootHandler parent;
    private Team team;
    private StringBuilder content;

    public TeamHandler(XMLReader reader, RootHandler parent) {
        this.reader = reader;
        this.parent = parent;
        this.content = new StringBuilder();
        this.team = new Team();
    }

    // characters can be called multiple times per element so aggregate the content in a StringBuilder
    public void characters(char[] ch, int start, int length) throws SAXException {
        content.append(ch, start, length);
    }

    public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
        content.setLength(0);
    }

    public void endElement(String uri, String localName, String name) throws SAXException {
        if (name.equals("name")) {
            team.setName(content.toString());
        } else if (name.equals("team")) {
            parent.addTeam(team);
            // Switch handler back to our parent
            reader.setContentHandler(parent);
        }
    }
}

这篇关于Java SAX解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆