禁用 XML 中的自动&符号转义? [英] Disable automatic ampersand escaping in XML?

查看:55
本文介绍了禁用 XML 中的自动&符号转义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.newDocument();

Element root = doc.createElement("list");
doc.appendChild(root);

for(CorrectionEntry correction : dictionary){
    Element elem = doc.createElement("elem");
    elem.setAttribute("from", correction.getEscapedFrom());
    elem.setAttribute("to", correction.getEscapedTo());
    root.appendChild(elem);
}

(然后将文档写入 XML 文件)

(then follows the writing of the document into an XML file)

where getEscapedFromgetEscapedTo 返回(在我的代码中)类似于 finké 如果原始单词是 finké.从而对大于127的字符进行Unicode转义.

where getEscapedFrom and getEscapedTo return (in my code) something like finké if the originating word is finké. So as to perform a Unicode escape for the characters that are bigger than 127.

问题在于最终的 XML 具有以下行 <elem from="finke";to=fink&amp;#xE9;"/>(fromfinketofinké)我想要的地方be

The problem is that the final XML has the following line <elem from="finke" to="fink&amp;#xE9;" /> (from is finke, to is finké) where I would like it to be <elem from="finke" to="fink&#xE9;" />

根据 StackOverflow 中的另一个响应,我尝试禁用 & 符号的转义,将行 doc.appendChild(doc.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, "&"));在创建 doc 但没有成功之后.

I've tried, following another response in StackOverflow, to disable escaping of ampersands putting the line doc.appendChild(doc.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, "&")); after the creation of the doc but without success.

我怎么能告诉 XML"呢?不逃避&符号?或者,相反,我怎么能让XML"从 é\\u00E9 转换为 &#xE9;?

How could I "tell XML" to not escape ampersands? Or, conversely, how could I let "XML" to convert from é, or \\u00E9, to &#xE9;?

我设法解决了这个问题:在写入文件之前,节点(通过调试)似乎包含正确的字符串.一旦我调用 transformer.transform(domSource, streamResult); 一切都会变得疯狂.

I managed to come to the problem: up until the writing of the file the node (through debug) seems to contain the right string. Once I call transformer.transform(domSource, streamResult); everything goes wild.

DOMSource domSource = new DOMSource(doc);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamResult streamResult = new StreamResult(baos);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(domSource, streamResult);
System.out.println(baos.toString());

问题似乎出在变压器上.

The problem seems to be the transformer.

推荐答案

尝试在转换器上设置 setOutputProperty("encoding", "us-ascii").这告诉序列化程序仅使用 ASCII 字符生成输出,这意味着任何非 ASCII 字符都将被转义.但是您无法控制它是十进制还是十六进制转义(除非您使用 Saxon-PE 或更高版本作为您的 Transformer,在这种情况下,有一个序列化选项来控制它).

Try setting setOutputProperty("encoding", "us-ascii") on the transformer. That tells the serializer to produce the output using ASCII characters only, which means any non-ASCII character will be escaped. But you can't control whether it will be a decimal or hex escape (unless you use Saxon-PE or higher as your Transformer, in which case there's a serialization option to control this).

尝试手动"进行序列化从来都不是一个好主意.至少有以下三个原因:(a) 你会弄错(我们看到很多 SO 问题是由人们以这种方式产生不好的 XML 引起的),(b) 你应该使用这些工具,而不是反对它们,(c) 编写序列化程序的人比您更了解 XML,并且他们知道对他们的期望.您可能正在处理由对 XML 的理解非常肤浅的人编写的需求.

It's never a good idea to try to do the serialization "by hand". For at least three reasons: (a) you'll get it wrong (we see a lot of SO questions caused by people producing bad XML this way), (b) you should be working with the tools, not against them, (c) the people who wrote the serializers understand XML better than you do, and they know what's expected of them. You're probably working to requirements written by someone whose understanding of XML is very superficial.

这篇关于禁用 XML 中的自动&符号转义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆