替换像& ndash这样的特殊字符;和& mdash;在xml文档中出现相应的代码,例如–等等 [英] Replace special characters like – and — occuring in an xml document with corresponding code like – etc
问题描述
我想替换像& ;;这样的特殊字符。 ndash的;和& MDASH;在xml文档中出现相应的代码,如& #150;等等。
I wish to replace special characters like & ndash; and & mdash; occuring in an xml document with corresponding code like & #150; etc
我有一个包含几个特殊字符的输入xml文档
i have an input xml document containing several special characters
<?xml version="1.0"?>
<!DOCTYPE BOOK SYSTEM "bookfull.dtd">
<BOOK>
<P>The war was between1890–1900
<AF>something—something else</AF>
</P>
</BOOK>
还有其他几个字符,如& rsquo的;单引号
there are several other characters like & rsquo; for single quotation
我的xslt代码如下
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="html" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />
<xsl:param name="pDest"
select="'file:///d:/LWW_Book_ePub_Transform/Epub_ZipCreation/XSLT_Transform/Output/'" />
<xsl:template-match="P">
<html>
<xsl:apply-templates/>
</html>
</xsl:template-match>
<xsl:template-match="AF">
.....
<xsl:apply-templates/>
.....
</xsl:template-match>
</xsl:stylesheet>
我解析的java代码如下(我正在使用saxon9。)
my java codes for parsing is as follow (i am making use of saxon9.)
package com.xsltprocessor;
import java.io.File;
import java.io.FileInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.Templates;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
public class ParseUsingSAX {
public ParseUsingSAX() {
}
public void parseBookContent(String xsltFile) {
try {
//File inputXml = new File("D:\\data\\myxml.0f");
File xslt = new File(xsltFile);
TransformerFactory factory = TransformerFactory.newInstance();
Templates template = factory.newTemplates(new StreamSource(new FileInputStream(xslt)));
Transformer xformer = template.newTransformer();
Source source = new StreamSource(new FileInputStream(inputXml));
StreamResult result = new StreamResult();
xformer.transform(source,result);
System.out.println("DONE");
}
catch (Exception ex) {
// TODO Auto-generated catch block
ex.printStackTrace();
System.out.println("IO exception: " + ex.getMessage());
}
}
}
我在转换后获得输出
<html>
The war was between1890–1900
</html>
预期产出
<html>
The war was between1890–1900
</html>
推荐答案
使用 xsl:字符-map
控制输出序列化的元素。
Use an xsl:character-map
element that controls output serialization.
<xsl:character-map name="dashes">
<xsl:output-character character="–" string="–"/>
</xsl:character-map>
您还必须申报
<xsl:output use-character-maps="dashes"/>
作为确保使用字符映射的顶级元素。
as a top-level element to ensure that the character mapping is used.
正如我在评论中提到的,& ndash;
是一个需要在XSLT中声明的HTML命名实体。参见例如有关详细信息,请此讨论。
As I mentioned in my comments, –
is an HTML named entity that needs to be declared in XSLT. See e.g. this discussion for more detail.
嵌入到您显示的样式表中(这会输出虚拟字符串MDASH和NDASH - 仅用于说明):
Embedded into the stylesheet you show (this outputs dummy strings "MDASH" and "NDASH" - just for illustration):
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE stylesheet [
<!ENTITY ndash "–" >
<!ENTITY mdash "—" >
]>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="html" omit-xml-declaration="yes" indent="yes" />
<xsl:output use-character-maps="dashes"/>
<xsl:strip-space elements="*" />
<xsl:character-map name="dashes">
<xsl:output-character character="–" string="NDASH"/>
<xsl:output-character character="—" string="MDASH"/>
</xsl:character-map>
<xsl:param name="pDest"
select="'file:///d:/LWW_Book_ePub_Transform/Epub_ZipCreation/XSLT_Transform/Output/'" />
<xsl:template match="BOOK">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="AF|P">
<xsl:copy>
<xsl:value-of select="."/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
请注意,这对使用 xsl生成的输出没有影响:结果-document
(因为你没有显示整个样式表)。有关字符映射的更多信息,请参阅我以前的答案和官方的推荐。
Note that this does not have an effect on output produced with xsl:result-document
(since you did not show your entire stylesheet). For more info on character-maps please refer to a previous answer of mine and the official recommendation.
这篇关于替换像& ndash这样的特殊字符;和& mdash;在xml文档中出现相应的代码,例如&#150;等等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!