生成XML时如何在CDATA中保留换行符? [英] How to preserve newlines in CDATA when generating XML?
问题描述
我想将包含newline
和tab
等空格字符的一些文本写到xml文件中,所以我使用
I want to write some text that contains whitespace characters such as newline
and tab
into an xml file so I use
Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));
但是当我重新阅读使用
Node vs = xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();
我得到的字符串不再有换行符了.
当我直接查看磁盘上的xml时,换行符似乎保留了下来.因此在读取xml文件时会发生此问题.
I get a string that has no newlines anymore.
When i look directly into the xml on disk, the newlines seem preserved. so the problem occurs when reading in the xml file.
如何保留换行符?
谢谢!
推荐答案
我不知道您如何解析和编写文档,但这是基于您的文档的增强代码示例:
I don't know how you parse and write your document, but here's an enhanced code example based on yours:
// creating the document in-memory
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element element = xmldoc.createElement("TestElement");
xmldoc.appendChild(element);
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));
// serializing the xml to a string
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
String str = writer.writeToString(xmldoc);
// printing the xml for verification of whitespace in cdata
System.out.println("--- XML ---");
System.out.println(str);
// de-serializing the xml from the string
final Charset charset = Charset.forName("utf-16");
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);
Node vs = xmldoc2.getElementsByTagName("TestElement").item(0);
final Node child = vs.getFirstChild();
String x = child.getNodeValue();
// print the value, yay!
System.out.println("--- Node Text ---");
System.out.println(x);
使用LSSerializer进行序列化是W3C的方法(看到这里).输出与预期的一样,并带有行分隔符:
The serialization using LSSerializer is the W3C way to do it (see here). The output is as expected, with line separators:
--- XML ---
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text ---
first line
second line
这篇关于生成XML时如何在CDATA中保留换行符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!