在 StAX XMLStreamConstants.CHARACTERS 事件中获取 XML 节点文本的问题 [英] Problems getting XML node text in StAX XMLStreamConstants.CHARACTERS event
问题描述
在使用 StAX 和 XMLStreamReader 读取 XML 文件时,我遇到了一个奇怪的问题.不确定是错误还是我做错了什么.仍在学习 StAX.
While reading an XML file using StAX and XMLStreamReader, I encountered a weird problem. Not sure if its an error or I am doing something wrong. Still learning StAX.
问题来了,
- 在
XMLStreamConstants.CHARACTERS
事件中,当我以XMLStreamReader.getText()
方法收集节点文本时. - 如果节点文本中有 &、<、> 或什至是隐藏的东西,它只返回文本字符串的第一部分.例如
ABC &XYZ
只返回ABC
- In
XMLStreamConstants.CHARACTERS
event, when I collect node text asXMLStreamReader.getText()
method. - If there is &, <, > or even something hidden for instance in node text, it returns only the first part of the text string.
e.g.
ABC & XYZ
returns onlyABC
简化的 Java 源代码:
// Start StaX reader
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
try {
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(inStream);
int event = xmlStreamReader.getEventType();
while (true) {
switch (event) {
case XMLStreamConstants.START_ELEMENT:
switch (xmlStreamReader.getLocalName()) {
case "group":
// Do something
break;
case "source":
isSource = true;
break;
case "target":
isTarget = true;
break;
default:
isSource = false;
isTrans = false;
break;
}
break;
case XMLStreamConstants.CHARACTERS:
if (srcData != null) {
String srcTrns = xmlStreamReader.getText();
if (srcTrns != null) {
if (isSource) {
// Set source text
isSource = false;
} else if (isTrans) {
// Set target text
isTrans = false;
}
}
}
break;
case XMLStreamConstants.END_ELEMENT:
if (xmlStreamReader.getLocalName().equals("group")) {
// Add to return list
}
break;
}
if (!xmlStreamReader.hasNext()) {
break;
}
event = xmlStreamReader.next();
}
} catch (XMLStreamException ex) {
LOG.log(Level.WARNING, ex.getMessage(), MessageFormat.format("{0} {1}", ex.getCause(), ex.getLocation()));
}
我不太确定我到底做错了什么或如何收集节点的完整文本.
I am not quite sure what exactly I am doing wrong or how to collect complete text of the node.
任何建议或提示都会对继续学习 StAX 有很大帮助.:-)
Any suggestions or tips would be a great help to move on learning StAX more. :-)
推荐答案
经过一番苦苦研究,我已经解决了这个问题.
I have solved the problem after struggling and researching a bit.
阅读带有转义实体引用的文本时出现问题.你需要设置XMLInputFactory IS_COALECING
到 true
It was a problem reading text with escaped entity references. You need to set
XMLInputFactory IS_COALESCING
to true
XMLInputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);
基本上,这告诉解析器用它们各自的替换文本(换句话说,类似于解码)替换内部实体引用,并将它们作为普通字符读取.
Basically this tells the parser to replace internal entity references with their respective replacement text (in other words, something like decoding) and read them as normal characters.
这篇关于在 StAX XMLStreamConstants.CHARACTERS 事件中获取 XML 节点文本的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!