使用 xslt 将混合内容修剪为最大字符数 [英] trim mixed content to max number of characters with xslt

查看:20
本文介绍了使用 xslt 将混合内容修剪为最大字符数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 xml:

<p>Lorem ipsum dolor sat amet, <b>consectetur adipisicing</b>精英,<i>sed do<sup>2</sup></i>eiusmod tempor incididunt ut laboure et dolore magna aliqua.Ut enim ad minim veniam, quis nostrud exercitation ullamco Laboris nisi ut aliquip ex ea commodo consequat.</p>

而且我只想显示前200个字符,但它可能不会在单词中间截断,我想保留格式元素.所以上面的片段经过改造就变成了:

And I want to show the first 200 characters only, but it may not cut off in the middle of a word, and I want to keep the formatting elements. So above fragment after transformation becomes:

<p>Lorem ipsum dolor sat amet, <b>consectetur adipisicing</b>精英,<i>sed do<sup>2</sup></i>eiusmod tempor incididunt ut laboure et dolore magna aliqua.Ut enim ad minim veniam, quis nostrud exercitation ullamco Laboris nisi ...</p>

有谁知道这是否可能?提前致谢!

Does anyone know if this is possible? Thanks in advance!

推荐答案

这个 XSLT 2.0 转换:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="no"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="pmaxChars" as="xs:integer" select="200"/>

 <xsl:variable name="vPass1">
   <xsl:apply-templates select="/*"/>
 </xsl:variable>

 <xsl:template match="node()|@*" mode="#default pass2">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="#current"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/">
  <xsl:apply-templates select="$vPass1" mode="pass2"/>
 </xsl:template>

 <xsl:template match=
 "text()[sum(preceding::text()/string-length()) ge $pmaxChars]"/>

 <xsl:template match="text()[not(following::text())]" mode="pass2">
   <xsl:variable name="vPrecedingLength"
     select="sum(preceding::text()/string-length())"/>

   <xsl:variable name="vRemaininingLength"
     select="$pmaxChars -$vPrecedingLength"/>

  <xsl:sequence select=
   "replace(.,
            concat('(^.{0,', $vRemaininingLength, '})\W.*'),
            '$1'
            )
   "/>
 </xsl:template>
</xsl:stylesheet>

应用于提供的 XML 文档时:

<p>Lorem ipsum dolor sit amet, <b>consectetur adipisicing</b> elit, <i>sed do<sup>2</sup></i> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>

产生想要的、正确的结果(一个 XML 文档,其中所有文本节点的总长度不超过 200,截断是在一个词边界上进行的,这是带有剩余的最大可能总字符串长度):

produces the wanted, correct result (an XML document in which the total length of all text nodes doesn't exceed 200, the truncation is performed on a word boundary, and this is the truncation with the maximum possible total string-length remaining):

<p>Lorem ipsum dolor sit amet, <b>consectetur adipisicing</b> elit, <i>sed do<sup>2</sup></i> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut</p>

说明:

  1. 这是一个通用解决方案,它接受最大文本字符数作为全局/外部参数 $pmaxChars.

这是一个两遍的解决方案.在 pass1 中,身份规则 被模板覆盖删除所有文本节点,其起始字符有一个索引(在所有文本节点的总连接中),大于允许的最大字符数.因此,pass1 的结果是一个 XML 文档,其中最大允许长度的中断"出现在最后一个文本节点中.

This is a two-pass solution. In pass1 the identity rule is overriden by a template that deletes all text nodes, whose starting character has an index (in the total concatenation of all text nodes), bigger than the maximum number of allowed characters. Thus, the result of pass1 is an XML document in which the "break" on maximum allowed length occurs in the last text node.

在第 2 轮中,我们使用与最后一个文本节点匹配的模板来覆盖标识规则.我们使用 replace() 函数:

In pass 2 we override the identity rule with a template that matches the last text node. We use the replace() function:

....

replace(.,
            concat('(^.{0,', $vRemaininingLength, '})\W.*'),
            '$1'
            )

这会导致匹配完整的字符串并被括号之间的子表达式替换.此子表达式是动态构造的,匹配从字符串开头开始并包含从 0 到 $vRemaininingLength(最大允许长度减去所有前面的文本节点的总长度)字符的最长子字符串,并且紧随其后的是一个单词边界字符.

this causes the complete string to be matched and to be replaced by the subexpression between the brackets. This subexpression is dynamically constructed and matches the longest substring starting at the start of the string and containing from 0 to $vRemaininingLength (the maximum allowed length minus the total length of all preceding text nodes) characters, and that is immediately followed by a word-boundary character.

更新:

要删除由于修剪而没有文本节点后代(空")的结果元素,只需添加此附加模板:

To get rid of resulting elements that due to the trimming have no text node descendents (are "empty"), simply add this additional template:

 <xsl:template match=
 "*[(.//text())[1][sum(preceding::text()/string-length()) ge $pmaxChars]]"/>

这篇关于使用 xslt 将混合内容修剪为最大字符数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆