在 XSLT 输出中编码特殊字符 [英] Encoding special chars in XSLT output

查看:34
本文介绍了在 XSLT 输出中编码特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经构建了一组脚本,其中的一部分将 XML 文档从一个词汇表转换为另一个词汇表中的文档子集.

I have built a set of scripts, part of which transform XML documents from one vocabulary to a subset of the document in another vocabulary.

出于对我来说不透明但显然不可协商的原因,目标平台(基于 Java)要求输出文档在 XML 声明中包含 'encoding="UTF-8"',但在其中包含一些特殊字符文本节点必须用它们的十六进制 unicode 值进行编码 - 例如'"' 必须替换为 '”' 等等.我无法获得必须编码哪些字符的明确列表,但它似乎不像所有非 ASCII"那么简单.

For reasons that are opaque to me, but apparently non-negotiable, the target platform (Java-based) requires the output document to have 'encoding="UTF-8"' in the XML declaration, but some special characters within text nodes must be encoded with their hex unicode value - e.g. '"' must be replaced with '”' and so forth. I have not been able to acquire a definitive list of which chars must be encoded, but it does not appear to be as simple as "all non-ASCII".

目前,我使用 ADODB 在处理后直接检查输出文件的每一行,并在必要时替换字符.这非常缓慢,不出所料,一些字符会被遗漏(并因此被目标平台破坏).

Currently, I have a horrid mess of VBScript using ADODB to directly check each line of the output file after processing, and replace characters where necessary. This is painfully slow, and unsurprisingly some characters get missed (and are consequently nuked by the target platform).

虽然我可以浪费时间改进"VBScript,但长期目标是完全摆脱它,我相信必须有一种更快、更准确的方法来实现这一目标,最好是在 XSLT 阶段自己.

While I could waste time "refining" the VBScript, the long-term aim is to get rid of that entirely, and I'm sure there must be a faster and more accurate way of achieving this, ideally within the XSLT stage itself.

谁能提出任何富有成效的调查途径?

Can anyone suggest any fruitful avenues of investigation?

(我不相信字符映射是答案 - 我以前看过它们,除非我弄错了,因为我的输入可能包含任何 unicode字符,我需要一个包含所有这些的地图除了那些我不想编码的...)

(edit: I'm not convinced that character maps are the answer - I've looked at them before, and unless I'm mistaken, since my input could conceivably contain any unicode character, I would need to have a map containing all of them except the ones I don't want encoded...)

推荐答案

<xsl:output encoding="us-ascii"/>

告诉序列化器它必须产生与 ASCII 兼容的输出.这应该会强制它为文本内容和属性值中的所有非 ASCII 字符生成字符引用.(如果标签或属性名称等其他地方有非 ASCII,序列化将失败.)

Tells the serialiser that it has to produce ASCII-compatible output. That should force it to produce character references for all non-ASCII characters in text content and attribute values. (Should there be non-ASCII in other places like tag or attribute names, serialisation will fail.)

这篇关于在 XSLT 输出中编码特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆