保留数字字符实体字符，如`&＃10; &＃13;`在Java中解析XML时 [英] Keep numeric character entity characters such as `&#10; &#13;` when parsing XML in Java

查看：2001 发布时间：2017/6/25 1:06:26 java xml dom unicode sax

本文介绍了保留数字字符实体字符，如`&＃10; &＃13;`在Java中解析XML时的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在解析包含数字字符实体字符的XML，例如（但不限于）&＃10; &安培;＃13; &安培; LT; & gt; （换行回车<>）。在解析时，我将节点的文本内容附加到StringBuffer，以便稍后将其写入文本文件。

I am parsing XML that contains numeric character entity characters such as (but not limited to) 
  < > (line feed carriage return < >) in Java. While parsing, I am appending text content of nodes to a StringBuffer to later write it out to a textfile.

但是，这些unicode字符被解析或转换为换行符/空格当我将字符串写入文件或打印出来。

However, these unicode characters are resolved or transformed into newlines/whitespace when I write the String to a file or print it out.

在Java中的XML文件的节点上迭代时，如何保留原始的数字字符实体字符符号，并将文本内容节点存储到String？

How can I keep the original numeric character entity characters symbols when iterating over nodes of an XML file in Java and storing the text content nodes to a String?

演示xml文件示例：

<?xml version="1.0" encoding="UTF-8"?>
<ABCD version="2">    
    <Field attributeWithChar="A string followed by special symbols &#13;  &#10;" />
</ABCD>

示例Java代码。它加载XML，遍历节点并将每个节点的文本内容收集到StringBuffer。迭代结束后，它将StringBuffer写入控制台，并将其写入文件（但不包含&＃10;&＃13; ）符号。

Example Java code. It loads the XML, iterates over the nodes and collects the text content of each node to a StringBuffer. After the iteration is over, it writes the StringBuffer to the console and also to a file (but no 
 ) symbols.

将这些符号存储到String时，会有什么办法保留这些符号？你可以帮我吗？谢谢。

What would be a way to keep these symbols when storing them to a String? Could you please help me? Thank you.

public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, TransformerException {   
    DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
    Document document = null;
    DocumentBuilder documentBuilder = documentFactory.newDocumentBuilder();
    document = documentBuilder.parse(new File("path/to/demo.xml"));
    StringBuilder sb = new StringBuilder();

    NodeList nodeList = document.getElementsByTagName("*");
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node node = nodeList.item(i);
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            NamedNodeMap nnp = node.getAttributes();
            for (int j = 0; j < nnp.getLength(); j++) {
                sb.append(nnp.item(j).getTextContent());
            }
        }
    }
    System.out.println(sb.toString());

    try (Writer writer = new BufferedWriter(new OutputStreamWriter(
            new FileOutputStream("path/to/demo_output.xml"), "UTF-8"))) {
        writer.write(sb.toString());
    }
}

推荐答案

你在将文件解析为文档之前，需要转义所有的XML实体。您可以使用相应的XML实体& amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; C>。这样的东西，


You need to escape all the XML entities before parsing the file into a Document. You do that by escaping the ampersand & itself with its corresponding XML entity &amp;. Something like,
DocumentBuilder documentBuilder =
        DocumentBuilderFactory.newInstance().newDocumentBuilder();

String xmlContents = new String(Files.readAllBytes(Paths.get("demo.xml")), "UTF-8");

Document document = documentBuilder.parse(
         new InputSource(new StringReader(xmlContents.replaceAll("&", "&amp;"))
        ));

  输出：  
Output :
2A string followed by special symbols &#13;  &#10;


                        这篇关于保留数字字符实体字符，如`&amp;＃10; &＃13;`在Java中解析XML时的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

保留数字字符实体字符，如`&＃10; &＃13;`在Java中解析XML时 [英] Keep numeric character entity characters such as `&#10; &#13;` when parsing XML in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

保留数字字符实体字符，如`&amp;＃10; &＃13;`在Java中解析XML时 [英] Keep numeric character entity characters such as `&amp;#10; &amp;#13;` when parsing XML in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

保留数字字符实体字符，如`&＃10; &＃13;`在Java中解析XML时 [英] Keep numeric character entity characters such as `
 ` when parsing XML in Java

登录关闭