保留数字字符实体字符,如`
 
`在Java中解析XML时 [英] Keep numeric character entity characters such as `
 
` when parsing XML in Java
问题描述
我正在解析包含数字字符实体字符的XML,例如(但不限于)
 &安培;#13; &安培; LT; & gt;
(换行回车<>)。在解析时,我将节点的文本内容附加到StringBuffer,以便稍后将其写入文本文件。
I am parsing XML that contains numeric character entity characters such as (but not limited to) < >
(line feed carriage return < >) in Java. While parsing, I am appending text content of nodes to a StringBuffer to later write it out to a textfile.
但是,这些unicode字符被解析或转换为换行符/空格当我将字符串写入文件或打印出来。
However, these unicode characters are resolved or transformed into newlines/whitespace when I write the String to a file or print it out.
在Java中的XML文件的节点上迭代时,如何保留原始的数字字符实体字符符号,并将文本内容节点存储到String?
How can I keep the original numeric character entity characters symbols when iterating over nodes of an XML file in Java and storing the text content nodes to a String?
演示xml文件示例:
<?xml version="1.0" encoding="UTF-8"?>
<ABCD version="2">
<Field attributeWithChar="A string followed by special symbols " />
</ABCD>
示例Java代码。它加载XML,遍历节点并将每个节点的文本内容收集到StringBuffer。迭代结束后,它将StringBuffer写入控制台,并将其写入文件(但不包含&#10;&#13;
)符号。
Example Java code. It loads the XML, iterates over the nodes and collects the text content of each node to a StringBuffer. After the iteration is over, it writes the StringBuffer to the console and also to a file (but no
) symbols.
将这些符号存储到String时,会有什么办法保留这些符号?你可以帮我吗?谢谢。
What would be a way to keep these symbols when storing them to a String? Could you please help me? Thank you.
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, TransformerException {
DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
Document document = null;
DocumentBuilder documentBuilder = documentFactory.newDocumentBuilder();
document = documentBuilder.parse(new File("path/to/demo.xml"));
StringBuilder sb = new StringBuilder();
NodeList nodeList = document.getElementsByTagName("*");
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
NamedNodeMap nnp = node.getAttributes();
for (int j = 0; j < nnp.getLength(); j++) {
sb.append(nnp.item(j).getTextContent());
}
}
}
System.out.println(sb.toString());
try (Writer writer = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("path/to/demo_output.xml"), "UTF-8"))) {
writer.write(sb.toString());
}
}
推荐答案
你在将文件解析为文档
之前,需要转义所有的XML实体。您可以使用相应的XML实体& amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; C>。这样的东西,
You need to escape all the XML entities before parsing the file into a Document
. You do that by escaping the ampersand &
itself with its corresponding XML entity &
. Something like,
DocumentBuilder documentBuilder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
String xmlContents = new String(Files.readAllBytes(Paths.get("demo.xml")), "UTF-8");
Document document = documentBuilder.parse(
new InputSource(new StringReader(xmlContents.replaceAll("&", "&"))
));
输出:
Output :
2A string followed by special symbols
这篇关于保留数字字符实体字符,如`&#10; &#13;`在Java中解析XML时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!