XSLT:从运行文本中删除重复的 br 标签 [英] XSLT: remove duplicate br-tags from running text
问题描述
在编辑富文本内容时,我们的 CMS 会生成带有重复的 <br/>
标签的 XML 文件.我想删除它们以生成可由另一个不理解这些重复项的发生的应用程序读取的输出.
When editing rich text content, our CMS generates XML-files with duplicate <br/>
-tags. I'd like to remove them in order to generate output that can be read by another application that does not appreciate the occurrence of those duplicates.
示例输入:
<p>
Lorem ipsum...<br />
<br />
..dolor sit
</p>
会产生这样的东西:
<p>
Lorem ipsum...<br />
..dolor sit
</p>
我已经在使用 XSLT 以其他方式操作输出,并且发现了一些执行相同操作的正则表达式和 PHP 示例,我只是认为由于速度原因,如果我可以使用 XSLT 执行此操作会更好我们的 CMS (Roxen) 中的引擎.
I am already using XSLT to manipulate the output in some other ways, and have found some examples of regexps and PHP that does the same thing, I just think it would be better if I could do this with XSLT due to the speed of the engine in our CMS (Roxen).
提前致谢!
推荐答案
根据@Nic 的回答,你可以使用
Building off @Nic's answer, you could use
<xsl:template match='br[preceding-sibling::node()[1][self::br]]'/>
我刚刚将 *
更改为 node()
.这将解决混淆两个中间有文本的
的问题.但是,即使中间只有一个空白节点,它也会停止删除重复的 <br/>
s.
I've just changed *
to node()
.
This would solve the problem of conflating two <br/>
s that have text in between. However it would stop removing duplicate <br/>
s even if there is only a whitespace node in between.
为了解决这个问题...
To solve that...
已弃用
起初我建议您可以从输入文档中的 p
元素中去除纯空白节点,方法是将其放在 XSLT 的顶层:
At first I had suggested you could strip whitespace-only nodes from p
elements in the input doc, by putting this at the top level of your XSLT:
<xsl:strip-space elements="p"/>
但是@Alejandro 指出这很容易导致您丢失重要的空格,如<p><em>bar</em><em>baz</em></p>
.
But @Alejandro pointed out that this could easily cause you to lose important spaces, as in <p><em>bar</em> <em>baz</em></p>
.
相反,
使用这个修改后的匹配模式:
use this modified match pattern:
<xsl:template match='br[preceding-sibling::node()
[not(self::text() and normalize-space(.) = "")][1]
[self::br]]'/>
有点丑,但它应该可以工作.这将匹配并抑制任何 br,其前面的兄弟节点不是纯空白文本节点也是 br".:-)
Kind of ugly but it should work. This will match and suppress "any br for which the preceding sibling node that is not a whitespace-only text node is also a br." :-)
鉴于匹配模式如此复杂,您可能更愿意将某些逻辑移到模板主体中,如下所示.我想这更多是个人品味和风格的问题:
Given that the match pattern is so complex, you may prefer to move some of that logic into the template body, as follows. I guess this is more a matter of personal taste and style:
<xsl:template match="br">
<xsl:if test="not(preceding-sibling::node()
[not(self::text() and normalize-space(.) = '')][1]
[self::br])">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:if>
</xsl:template>
当 <br/>
不是我们想要抑制的时候,我们在这里使用恒等变换的副本.我不认为 <br/>
可以接受子元素或文本,但安全无害.
Here we use a copy of the identity transform when the <br />
is not one we want to suppress. I don't think <br />
can take child elements or text, but it doesn't hurt to be safe.
(更新了上述内容.我上次保存编辑时忘记完成该示例代码.)
(Updated the above. I had forgotten to finish that sample code last time I saved edits.)
这篇关于XSLT:从运行文本中删除重复的 br 标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!