XSLT-标准化不间断的空白字符 [英] XSLT- normalize non-breaking whitespace characters

查看:22
本文介绍了XSLT-标准化不间断的空白字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的示例 xml 文件,

I have a sample xml file like this,

<doc>
    <p>text1 text2  </p>
    <p>text1 text2     </p>
    <p>text1 text2   </p>
</doc>

这个示例xml,第一个

有空格空格字符(&#x0020;),第二个

具有制表符空白字符 (&#x9;) 并且第三个 <p> 具有空格不间断空白字符 (&#x00A0;).

this sample xml, first <p> has space whitespace character (&#x0020;), second <p> has tab whitespace whitespace character (&#x9;) and third <p> has space non-breaking whitespace character (&#x00A0;).

我需要删除在结束标记之前出现的任何空格.

I need to remove the any white spaces appearing just before closing tag.

所以,预期的输出应该是,

So, expected output should be,

<doc>
    <p>text1 text2</p>
    <p>text1 text2</p>
    <p>text1 text2</p>
</doc>

通过使用 xslt normalize-space() 我可以删除不必要的空格和制表符,但不能删除不间断的空白字符.

By using xslt normalize-space() I can remove unnecessary spaces and tab characters but not non-breaking whitespace characters.

<xsl:template match="p/text()">
    <xsl:value-of select="normalize-space()"/>
</xsl:template>

有什么建议可以规范化所有空格,包括 xslt 中的不间断空格?

Any suggestions how can I normalize all white spaces including non-breaking white spaces in xslt?

推荐答案

您可以:

<xsl:value-of select="normalize-space(translate(., '&#160;', ' '))"/>

这将适用于 XSLT 1.0 和 2.0.

This will work in XSLT 1.0 and 2.0 alike.

在 XSLT 2.0 中,您还可以使用正则表达式 - 例如:

In XSLT 2.0, you could also use regex - for example:

<xsl:value-of select="replace(., '[\t\p{Zs}]', '')"/>

将删除水平制表符以及Unicode Space_Separator 类别中的任何字符,其中不仅包括空格和不间断空格字符,还包括其他空格字符.文档很难找到,但我相信这是目前的完整列表:(摘自 http://www.unicode.org/Public/UNIDATA/UnicodeData.txt):

will remove the horizontal tab character as well as any character in the Unicode Space_Separator category, which includes not only the space and non-breaking space characters but also other space characters. Documentation is hard to find, but I believe this is currently the complete list: (extracted from http://www.unicode.org/Public/UNIDATA/UnicodeData.txt):

&#x0020; SPACE
&#x00A0; NO-BREAK SPACE
&#x1680; OGHAM SPACE MARK
&#x2000; EN QUAD
&#x2001; EM QUAD
&#x2002; EN SPACE
&#x2003; EM SPACE
&#x2004; THREE-PER-EM SPACE
&#x2005; FOUR-PER-EM SPACE
&#x2006; SIX-PER-EM SPACE
&#x2007; FIGURE SPACE
&#x2008; PUNCTUATION SPACE
&#x2009; THIN SPACE
&#x200A; HAIR SPACE
&#x202F; NARROW NO-BREAK SPACE
&#x205F; MEDIUM MATHEMATICAL SPACE
&#x3000; IDEOGRAPHIC SPACE

&#x10CB0; OLD HUNGARIAN CAPITAL LETTER EZS
&#x10CF0; OLD HUNGARIAN SMALL LETTER EZS
&#x16F36; MIAO LETTER ZSHA
&#x16F3C; MIAO LETTER ZSA
&#x16F3E; MIAO LETTER ZZSA
&#x16F41; MIAO LETTER ZZSYA

但是,使用 Saxon 9.5 进行的测试表明无法识别最后 6 个字符:http://xsltransform.net/ncntCSo

However, testing with Saxon 9.5 shows that the last 6 characters are not recognized: http://xsltransform.net/ncntCSo

这篇关于XSLT-标准化不间断的空白字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆