如果代码点在给定范围内,则转换字符 [英] convert character if codepoint within given range

查看:37
本文介绍了如果代码点在给定范围内,则转换字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个 XML 文件,其中包含代码点值介于 57600 和 58607 之间的 unicode 字符.目前这些在我的内容中显示为方块,我想将它们转换为元素.

I have a couple of XML files that contain unicode characters with codepoint values between 57600 and 58607. Currently these are shown in my content as square blocks and I'd like to convert these to elements.

所以我想要实现的是:

<!-- current input -->
<p> Follow the on-screen instructions.</p>  
<!-- desired output-->
<p><unichar value="58208"/> Follow the on-screen instructions.</p>
<!-- Where 58208 is the actual codepoint of the unicode character in question -->

我在分词器方面有点糊涂,但由于您需要参考拆分,结果证明这过于复杂.

I've fooled around a bit with tokenizer but as you need to have reference to split upon this turned out to be over complicated.

关于如何最好地解决这个问题有什么建议吗?我一直在尝试下面的一些东西,但被击中了(不要介意语法,我知道它没有任何意义)

Any advice on how to tackle this best ? I've been trying some things like below but got struck (don't mind the syntax, I know it doesn't make any sense)

<xsl:template match="text()">
 -> for every character in my string
    -> if string-to-codepoints(current character) greater then 57600 return <unichar value="codepoint value"/>
       else return character
</xsl:template>

推荐答案

这听起来像是analyze-string 的工作,例如

It sounds like a job for analyze-string e.g.

<xsl:template match="text()">
  <xsl:analyze-string select="." regex="[&#57600;-&#58607;]">
    <xsl:matching-substring>
       <unichar value="{string-to-codepoints(.)}"/>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

未经测试.

这篇关于如果代码点在给定范围内,则转换字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆