使用XSL从HTML中包装单词 [英] Wrapping words from HTML using XSL

查看:94
本文介绍了使用XSL从HTML中包装单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在HTML文档中用标签(例如span)包装每个单词,如:

 < html> ; 
< head>
< title>没关系< / title>
< / head>
< body>
< div>文本在div中< / div>
< div>
div中的文本
< p>
p
中的文字< / p>
< / div>
< / body>
< / html>

结果如下:

 < HTML> 
< head>
< title>没关系< / title>
< / head>
< body>
< div> < span>文字< / span> <跨度>在< / span>中<跨度>一个< / span> <跨度> div< / span> < / DIV>
< div>

< span>文字< / span> <跨度>在< / span>中<跨度>一个< / span> <跨度> div< / span>
< p>
< span>文字< / span> <跨度>在< / span>中<跨度>一个< / span> <跨度> p< / span>
< / p>
< / div>
< / body>
< / html>

保持身体结构非常重要...



任何帮助?



谢谢!

解决方案

<以下三种不同的解决方案都使用XSLT设计模式来覆盖 身份规则 来通常保留XML文档的结构和内容,并且只修改特定节点。



I。 XSLT 1.0解决方案

这个简短而简单的转换(no < xsl:choose>

 < xsl:stylesheet version =1.0
xmlns: XSL = http://www.w3.org/1999/XSL/Transform >

< xsl:template match =node()| @ *>
< xsl:copy>
< xsl:apply-templates select =node()| @ */>
< / xsl:copy>
< / xsl:template>

name =split>
concat(normalize-space(。),'')/>

< xsl:if test =string-length(normalize-space($ pText))> 0>
< span>
< xsl:value-of select =
substring-before($ pText,'')/>
< / span>

< xsl:call-template name =split>
select =substring-after($ pText,'')/>
< / xsl:call-template>
< / xsl:if>
< / xsl:template>
< / xsl:stylesheet>

应用于提供的XML文档

 < html> 
< head>
< title>没关系< / title>
< / head>
< body>
< div>文本在div中< / div>
< div>
div中的文本
< p>
p
中的文字< / p>
< / div>
< / body>
< / html>

产生想要的正确结果



 < html> 
< head>
< title>没关系< / title>
< / head>
< body>
< div>
< span>文字< / span>
< span>在< / span>
< span> a< / span>
< span> div< / span>
< / div>
< div>
< span>文字< / span>
< span>在< / span>
< span> a< / span>
< span> div< / span>
< p>
< span>文字< / span>
< span>里面< / span>
< span> a< / span>
< span> p< / span>
< / p>
< / div>
< / body>
< / html>

II。 XSLT 2.0解决方案:

 < xsl:stylesheet version =2.0
xmlns:xsl = http://www.w3.org/1999/XSL/Transform\">

< xsl:template match =node()| @ *>
< xsl:copy>
< xsl:apply-templates select =node()| @ */>
< / xsl:copy>
< / xsl:template>

< xsl:for-each select =tokenize(。,'[\s]')[。]>
< span>< xsl:sequence select =。/>< / span>
< / xsl:for-each>
< / xsl:template>
< / xsl:stylesheet>

当这个转换应用于同一个XML文档(上面)时,想要的结果被生成

 < html> 
< head>
< title>没关系< / title>
< / head>
< body>
< div>
< span>文字< / span>
< span>在< / span>
< span> a< / span>
< span> div< / span>
< / div>
< div>
< span>文字< / span>
< span>在< / span>
< span> a< / span>
< span> div< / span>
< p>
< span>文字< / span>
< span>里面< / span>
< span> a< / span>
< span> p< / span>
< / p>
< / div>
< / body>
< / html>

III使用 FXSL



使用 str-split-to-words FXSL的模板/函数可以很容易地实现更复杂的标记 - 在任何版本的XSLT中:

让我们有一个更复杂的XML文档,标记化规则

 < html> 
< head>
< title>没关系< / title>
< / head>
< body>
< div>文本:在div中< / div>
< div>
文本;在;一个。 div
< p>
Text- inside [a] [p]
< / p>
< / div>
< / body>
< / html>

这里有多个分隔符表示单词的开始或结束。在这个特定的例子中,分隔符可以是:;。 , - []



以下转换使用FXSL

 < xsl:stylesheet version =1.0
xmlns:这个更复杂的标记化
xsl =http://www.w3.org/1999/XSL/Transform
xmlns:ext =http://exslt.org/common
exclude-result-prefixes =ext >

< xsl:import href =strSplit-to-Words.xsl/>


< xsl:template match =node()| @ *>
< xsl:copy>
< xsl:apply-templates select =node()| @ */>
< / xsl:copy>
< / xsl:template>

< xsl:variable name =vwordNodes>
< xsl:call-template name =str-split-to-words>
select =';。: - []'/>
< / xsl:call-template>
< / xsl:variable>

< xsl:apply-templates select =ext:node-set($ vwordNodes)/ */>
< / xsl:template>

< xsl:template match =word [string-length(normalize-space(。))> 0]>
< span>
< xsl:value-of select =。/>
< / span>
< / xsl:template>
< / xsl:stylesheet>

并产生想要的正确结果



 < html> 
< head>
< title>没关系< / title>
< / head>
< body>
< div>
< span>文字< / span>
< span>在< / span>
< span> a< / span>
< span> div< / span>
< / div>
< div>
< span>文字< / span>
< span>在< / span>
< span> a< / span>
< span> div< / span>
< p>
< span>文字< / span>
< span>里面< / span>
< span> a< / span>
< span> p< / span>
< word />
< / p>
< / div>
< / body>
< / html>


I need wrapping each word with a tag (e. span) in a HTML document, like:

<html>
<head>
    <title>It doesnt matter</title>
</head>
<body>
         <div> Text in a div </div>
         <div>
    Text in a div
    <p>
        Text inside a p
    </p>
     </div>
</body>
</html>

To result something like this:

<html>
<head>
    <title>It doesnt matter</title>
</head>
<body>
         <div> <span>Text </span> <span> in </span> <span> a </span> <span> div </span> </div>
         <div>

             <span>Text </span> <span> in </span> <span> a </span> <span> div </span>                     
             <p>
               <span>Text </span> <span> in </span> <span> a </span> <span> p </span> 
             </p>
     </div>
</body>
</html>

It's important to keep the structure of the body...

Any help?

Thank you!

解决方案

All of the three different solutions below use the XSLT design pattern of overriding the identity rule to generally preserve the structure and contents of the XML document, and only modify specific nodes.

I. XSLT 1.0 solution:

This short and simple transformation (no <xsl:choose> used anywhere):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[not(self::title)]/text()"
               name="split">
  <xsl:param name="pText" select=
       "concat(normalize-space(.), ' ')"/>

  <xsl:if test="string-length(normalize-space($pText)) >0">
   <span>
   <xsl:value-of select=
        "substring-before($pText, ' ')"/>
   </span>

   <xsl:call-template name="split">
    <xsl:with-param name="pText"
         select="substring-after($pText, ' ')"/>
   </xsl:call-template>
  </xsl:if>
 </xsl:template>
</xsl:stylesheet>

when applied to the provided XML document:

<html>
    <head>
        <title>It doesnt matter</title>
    </head>
    <body>
        <div> Text in a div </div>
        <div>
         Text in a div
            <p>
             Text inside a p
         </p>
        </div>
    </body>
</html>

produces the wanted, correct result:

<html>
   <head>
      <title>It doesnt matter</title>
   </head>
   <body>
      <div>
         <span>Text</span>
         <span>in</span>
         <span>a</span>
         <span>div</span>
      </div>
      <div>
         <span>Text</span>
         <span>in</span>
         <span>a</span>
         <span>div</span>
         <p>
            <span>Text</span>
            <span>inside</span>
            <span>a</span>
            <span>p</span>
         </p>
      </div>
   </body>
</html>

II. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[not(self::title)]/text()">
  <xsl:for-each select="tokenize(., '[\s]')[.]">
   <span><xsl:sequence select="."/></span>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied to the same XML document (above), again the correct, wanted result is produced:

<html>
   <head>
      <title>It doesnt matter</title>
   </head>
   <body>
      <div>
         <span>Text</span>
         <span>in</span>
         <span>a</span>
         <span>div</span>
      </div>
      <div>
         <span>Text</span>
         <span>in</span>
         <span>a</span>
         <span>div</span>
         <p>
            <span>Text</span>
            <span>inside</span>
            <span>a</span>
            <span>p</span>
         </p>
      </div>
   </body>
</html>

III Solution using FXSL:

Using the str-split-to-words template/function of FXSL one can easily implement much more complicated tokenization -- in any version of XSLT:

Let's have a more complicated XML document and tokenization rules:

<html>
    <head>
        <title>It doesnt matter</title>
    </head>
    <body>
        <div> Text: in a div </div>
        <div>
         Text; in; a. div
            <p>
             Text- inside [a] [p]
         </p>
        </div>
    </body>
</html>

Here there is more than one delimiter that indicates the start or end of a word. In this particular example the delimiters can be: " ", ";", ".", ":", "-", "[", "]".

The following transformation uses FXSL for this more complicated tokenization:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common"
 exclude-result-prefixes="ext">

   <xsl:import href="strSplit-to-Words.xsl"/>

   <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
   <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
        <xsl:copy>
          <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::title)]/text()">
      <xsl:variable name="vwordNodes">
        <xsl:call-template name="str-split-to-words">
          <xsl:with-param name="pStr" select="normalize-space(.)"/>
          <xsl:with-param name="pDelimiters" 
                          select="' ;.:-[]'"/>
        </xsl:call-template>
      </xsl:variable>

      <xsl:apply-templates select="ext:node-set($vwordNodes)/*"/>
    </xsl:template>

    <xsl:template match="word[string-length(normalize-space(.)) > 0]">
      <span>
        <xsl:value-of select="."/>
      </span>
    </xsl:template>
</xsl:stylesheet>

and produces the wanted, correct result:

<html>
   <head>
      <title>It doesnt matter</title>
   </head>
   <body>
      <div>
         <span>Text</span>
         <span>in</span>
         <span>a</span>
         <span>div</span>
      </div>
      <div>
         <span>Text</span>
         <span>in</span>
         <span>a</span>
         <span>div</span>
         <p>
            <span>Text</span>
            <span>inside</span>
            <span>a</span>
            <span>p</span>
            <word/>
         </p>
      </div>
   </body>
</html>

这篇关于使用XSL从HTML中包装单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆