在 StAX XMLStreamConstants.CHARACTERS 事件中获取 XML 节点文本的问题 [英] Problems getting XML node text in StAX XMLStreamConstants.CHARACTERS event

查看:29
本文介绍了在 StAX XMLStreamConstants.CHARACTERS 事件中获取 XML 节点文本的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用 StAX 和 XMLStreamReader 读取 XML 文件时,我遇到了一个奇怪的问题.不确定是错误还是我做错了什么.仍在学习 StAX.

While reading an XML file using StAX and XMLStreamReader, I encountered a weird problem. Not sure if its an error or I am doing something wrong. Still learning StAX.

问题来了,

  1. XMLStreamConstants.CHARACTERS 事件中,当我以 XMLStreamReader.getText() 方法收集节点文本时.
  2. 如果节点文本中有 &、<、> 或什至是隐藏的东西,它只返回文本字符串的第一部分.例如ABC &XYZ 只返回 ABC
  1. In XMLStreamConstants.CHARACTERS event, when I collect node text as XMLStreamReader.getText() method.
  2. If there is &, <, > or even something hidden for instance in node text, it returns only the first part of the text string. e.g. ABC & XYZ returns only ABC

简化的 Java 源代码:

    // Start StaX reader
    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    try {
        XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(inStream);
        int event = xmlStreamReader.getEventType();
        while (true) {
            switch (event) {
                case XMLStreamConstants.START_ELEMENT:
                    switch (xmlStreamReader.getLocalName()) {
                        case "group":
                        // Do something
                            break;
                        case "source":
                            isSource = true;
                            break;
                        case "target":
                            isTarget = true;
                            break;
                        default:
                            isSource = false;
                            isTrans = false;
                            break;
                    }
                    break;
                case XMLStreamConstants.CHARACTERS:
                    if (srcData != null) {
                        String srcTrns = xmlStreamReader.getText();
                        if (srcTrns != null) {
                            if (isSource) {
                                // Set source text
                                isSource = false;
                            } else if (isTrans) {
                                // Set target text
                                isTrans = false;
                            }
                        }
                    }
                    break;
                case XMLStreamConstants.END_ELEMENT:
                    if (xmlStreamReader.getLocalName().equals("group")) {
                        // Add to return list
                    }
                    break;
            }
            if (!xmlStreamReader.hasNext()) {
                break;
            }
            event = xmlStreamReader.next();
        }
    } catch (XMLStreamException ex) {
        LOG.log(Level.WARNING, ex.getMessage(), MessageFormat.format("{0} {1}", ex.getCause(), ex.getLocation()));
    }

我不太确定我到底做错了什么或如何收集节点的完整文本.

I am not quite sure what exactly I am doing wrong or how to collect complete text of the node.

任何建议或提示都会对继续学习 StAX 有很大帮助.:-)

Any suggestions or tips would be a great help to move on learning StAX more. :-)

推荐答案

经过一番苦苦研究,我已经解决了这个问题.

I have solved the problem after struggling and researching a bit.

阅读带有转义实体引用的文本时出现问题.你需要设置XMLInputFactory IS_COALECINGtrue

It was a problem reading text with escaped entity references. You need to set XMLInputFactory IS_COALESCING to true

XMLInputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);

基本上,这告诉解析器用它们各自的替换文本(换句话说,类似于解码)替换内部实体引用,并将它们作为普通字符读取.

Basically this tells the parser to replace internal entity references with their respective replacement text (in other words, something like decoding) and read them as normal characters.

这篇关于在 StAX XMLStreamConstants.CHARACTERS 事件中获取 XML 节点文本的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆