为什么 Apache Xerces/Xalan 向我的序列化输出添加额外的回车? [英] Why is Apache Xerces/Xalan adding additional carriage returns to my serialized output?

查看:18
本文介绍了为什么 Apache Xerces/Xalan 向我的序列化输出添加额外的回车?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 Apache Xerces 2.11.0 和 Apache Xalan 2.7.1,但我在序列化 XML 中遇到附加回车字符的问题.

I'm using Apache Xerces 2.11.0 and Apache Xalan 2.7.1 and I'm having problems with additional carriage return characters in the serialized XML.

我有这个(伪)代码:

String myString = ...;
Document doc = ...;

Element item = doc.createElement("item");
item.appendChild(doc.createCDATASection(myString));

Transformer transformer = ...;
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Result result = new StreamResult(stream);
transformer.transform(new DOMSource(document), result);

现在 myString 包含换行符 (\r\n),(实际上是 base64 编码的数据)但是当我查看序列化输出时,还有额外的 \r 个字符.

Now myString contains line breaks (\r\n), (actually it's base64 encoded data) but when I look at the serialized output, there are additional \r characters.

输入:

Line 1 \r\n
Line 2 \r\n
Line 3 \r\n

输出:

Line 1 \r\r\n
Line 2 \r\r\n
Line 3 \r\r\n

如果我使用 createTextNode 而不是 createCDATASection 输出变得更加有趣:

If I use createTextNode instead of createCDATASection the output becomes even more interesting:

Line 1 
\r\n
Line 2 
\r\n
Line 3 
\r\n

附加字符好像是在序列化的时候引入的,DOM树好像是对的.(根据getTextContent())

The additional character seems to be introduced during serialization, the DOM tree seems to be correct. (According to getTextContent())

为什么会这样?我该怎么做才能解决这个问题?

Why is this happening? What can I do to fix this?

推荐答案

我猜你是在 Windows 上遇到这个问题,而不是在 Linux/Solaris/Mac 上.Xalan 序列化程序 (org.apache.xml.serializer.ToStream.java) 使用 System.getProperty("line.separator") 获取行分隔符.当序列化程序写入 \r\n 时,它会将 \n 解释为行序列的结尾,并且实际上写入 \r+lineSeparator = \r\r\n.尽管这听起来很奇怪,但这不是错误,请参阅 [1].但由于这经常被报告为错误,因此添加了一个 xalan 扩展属性 [2].所以你可以通过编程方式设置:

I guess your are having this problem on Windows and not on Linux/Solaris/Mac. Xalan serializer (org.apache.xml.serializer.ToStream.java) gets the line separator using System.getProperty("line.separator"). When the serializer writes \r\n, it interprets the \n as the end of line sequence and it actually writes \r+lineSeparator = \r\r\n. Although this sounds strange, this is not a bug, see [1]. But since this was frequently reported as a bug, a xalan extension property was added [2]. So you may programmatically set:

transformer.setOutputProperty("{http://xml.apache.org/xalan}line-separator","\n");

<xsl:output xalan:line-separator="&#10;" />

其中 xalan 是与 URL "http://xml.apache.org/xalan".

where xalan is a prefix associated with the URL "http://xml.apache.org/xalan".

[1] https://issues.apache.org/jira/browse/XALANJ-1660

[2] https://issues.apache.org/jira/browse/XALANJ-2093

这篇关于为什么 Apache Xerces/Xalan 向我的序列化输出添加额外的回车?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆