使用Java读取包含特殊字符(&,-等)的XML文档节点 [英] Reading XML document nodes containing special characters (&, -, etc) with Java

查看:886
本文介绍了使用Java读取包含特殊字符(&,-等)的XML文档节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码无法检索包含特殊字符的整个元素节点. 例如,对于此节点:

My code does not retrieve the entirety of element nodes that contain special characters. For example, for this node:

<theaterName>P&G Greenbelt</theaterName>

由于与号,它将仅检索"P".我需要检索整个字符串.

It would only retrieve "P" due to the ampersand. I need to retrieve the entire string.

这是我的代码:

public List<String> findTheaters() {

    //Clear theaters application global
    FilmhopperActivity.tData.clearTheaters();

    ArrayList<String> theaters = new ArrayList<String>();

    NodeList theaterNodes = doc.getElementsByTagName("theaterName");

    for (int i = 0; i < theaterNodes.getLength(); i++) {

        Node node = theaterNodes.item(i);
        if (node.getNodeType() == Node.ELEMENT_NODE) {

            //Found theater, add to return array
            Element element = (Element) node;
            NodeList children = element.getChildNodes();
            String name = children.item(0).getNodeValue();
            theaters.add(name);

            //Logging
            android.util.Log.i("MoviefoneFetcher", "Theater found: " + name);

            //Add theater to application global
            Theater t = new Theater(name);
            FilmhopperActivity.tData.addTheater(t);
        }
    }

    return theaters;
}

我尝试添加代码以扩展名称字符串以连接其他children.items,但是没有用.我只会得到"P&".

I tried adding code to extend the name string to concatenate additional children.items, but it didn't work. I'd only get "P&".

...
String name = children.item(0).getNodeValue();
for (int j = 1; j < children.getLength() - 1; j++) {
    name += children.item(j).getNodeValue();
}

感谢您的时间.

更新: 找到了一个名为normalize()的函数,您可以在Nodes上调用该函数,该函数结合了所有文本子节点,因此可以做一个子节点.item(0)包含所有子节点的文本,包括&符!

UPDATE: Found a function called normalize() that you can call on Nodes, that combines all text child nodes so doing a children.item(0) contains the text of all the children, including ampersands!

推荐答案

&是XML中的转义字符.看起来像这样的XML:

The & is an escape character in XML. XML that looks like this:

<theaterName>P&G Greenbelt</theaterName>

实际上应该被解析器拒绝.相反,它应如下所示:

should actually be rejected by the parser. Instead, it should look like this:

<theaterName>P&amp;G Greenbelt</theaterName>

有一些这样的字符,例如<(& lt),>(&>),"(& quot)和'(& ).还有其他转义字符的方法,例如通过&#x2022;中的Unicode值.或&#12345;.

There are a few such characters, such as < (&lt;), > (&gt;), " (&quot;) and ' (&apos;). There are also other ways to escape characters, such as via their Unicode value, as in &#x2022; or &#12345;.

有关更多信息, XML规范相当清除.

For more information, the XML specification is fairly clear.

现在,取决于树的构造方式,另一件事可能是字符 正确地转义了,并且显示的示例不是实际存在的,而是这种情况数据在树中表示.

Now, the other thing it might be, depending on how your tree was constructed, is that the character is escaped properly, and the sample you showed isn't what's actually there, and it's how the data is represented in the tree.

例如,当使用SAX来构建树时,实体(& -thingies)被分解并分别交付.这是因为SAX解析器尝试返回连续的数据块,并且当它到达转义字符时,它将发送其具有的内容,并使用转换后的&值开始一个新的块.因此,您可能需要在树中合并连续的文本节点才能获得整个值.

For example, when using SAX to build a tree, entities (the &-thingies) are broken apart and delivered separately. This is because the SAX parser tries to return contiguous chunks of data, and when it gets to the escape character, it sends what it has, and starts a new chunk with the translated &-value. So you might need to combine consecutive text nodes in your tree to get the whole value.

这篇关于使用Java读取包含特殊字符(&amp;,-等)的XML文档节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆