使用Java处理规范化中的回车 [英] handling carriage return in canonicalization with java

查看:215
本文介绍了使用Java处理规范化中的回车的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过 com / sun / org / apache / xml / internal / security / c14n / Canonicalizer.java 类规范化html文本节点。我的输入文件末尾有回车符和换行符。规范化后,我希望看到回车符转换为& #xD; 。但是,我得到的输出不包含回车符。它仅包含换行符。我应该如何修改代码以包含回车符?

I am trying to canonicalize an html text node by com/sun/org/apache/xml/internal/security/c14n/Canonicalizer.java class. My input file has carriage return and a line feed at the end. Upon canonicalization I expect to see the carriage return transformed into 
. However, the the output I get does not contain the carriage return. It only contains the line feed. How should I modify my code to include the carriage return?

示例:我的输入带有 cr lf 最后

example: my input with cr and lf at the end

<MyNode xmlns="http://www.artsince.com/test#">Lqc3EeJlyY45bBm1lha869dkHWw1w+U8A6aKM2Xuwk3yWTjt0A2Wq/25rAncSBQlBGOCyTmhfic9(crlf)
9mWf4mC2Ui6ccLqCMjFR4mDQApkfoTy+Cu2eHul9CRjKa0TqckFv7ryda9V5MHruueXII/V+gPLT(crlf)
c76LsetK8C1434K66+Q=</MyNode>

这是我使用的示例代码

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File("C:\\text.xml")));

if(!Init.isInitialized())
{
   Init.init();
}

Path xPath = XPathFactory.newInstance().newXPath();
String expression = "child::*/child::text()"; 
NodeList textNodeList = (NodeList) xPath.evaluate(expression, doc, XPathConstants.NODESET);
Canonicalizer cn = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
byte[] canonn = cn.canonicalizeXPathNodeSet(textNodeList);
System.out.println(new String(canonn).toCharArray());

我得到的输出只有 lf 到底

Lqc3EeJlyY45bBm1lha869dkHWw1w+U8A6aKM2Xuwk3yWTjt0A2Wq/25rAncSBQlBGOCyTmhfic9(lf)
9mWf4mC2Ui6ccLqCMjFR4mDQApkfoTy+Cu2eHul9CRjKa0TqckFv7ryda9V5MHruueXII/V+gPLT(lf)
c76LsetK8C1434K66+Q=

不过,我希望看到& #xD; lf 在行尾

however, I expect to see &#xD; and lf at the end of lines

Lqc3EeJlyY45bBm1lha869dkHWw1w+U8A6aKM2Xuwk3yWTjt0A2Wq/25rAncSBQlBGOCyTmhfic9&#xD;(lf)
9mWf4mC2Ui6ccLqCMjFR4mDQApkfoTy+Cu2eHul9CRjKa0TqckFv7ryda9V5MHruueXII/V+gPLT&#xD;(lf)
c76LsetK8C1434K66+Q=


推荐答案

XML定义输入可以包含所有可能的EOL样式,但解析器必须替换所有这些都使用单个换行符( \n ,ASCII 10)字符。

XML defines that the input can contain all possible kinds of EOL styles but that the parser must replace all of them with a single linefeed (\n, ASCII 10) character.

如果要保护字符,必须用<$替换ASCII 13 c $ c>&#13; 自己,然后XML解析器看到输入。如果您使用Java,建议使用 FilterInputStream

If you want to protect the character, you must replace ASCII 13 with &#13; yourself before the XML parser sees the input. If you use Java, I suggest to use a FilterInputStream.

这篇关于使用Java处理规范化中的回车的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆