如何使用 XSLT 1.0 或 XPath 来操作 HTML 字符串 [英] How to use XSLT 1.0 or XPath to manipulate an HTML string

查看:32
本文介绍了如何使用 XSLT 1.0 或 XPath 来操作 HTML 字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的问题:下面的代码片段(在 <xsl:choose> 内)不能可靠地去除 <p><div>
标签使用 substring-before()substring() 功能.

This is my problem: The code snippet below (inside the <xsl:choose>) does not reliably strip <p>, <div> or <br> tags out of a string using a combination of the substring-before() and substring() functions.

我尝试格式化的字符串是 SharePoint SPS 2003 列表项的属性 - 通过富文本编辑器输入的文本.我理想中需要的是一个包罗万象的 <xsl:when> 测试,它总是在换行符(实际上是第一段)之前抓取字符串中的文本.我认为:

The string I'm trying to format is an attribute of a SharePoint SPS 2003 list item - text inputted via a rich text editor. What I ideally need is a catch-all <xsl:when> test that will always just grab the text within the string before a line break (effectively the first paragraph). I thought that:

<xsl:when test="contains(Story, '&#x0a;')='True'">

会这样做,但它并不总是像富文本编辑器插入 <br><p> 标签那样工作,看起来这些并不总是由 &#x0a; 值表示.

Would do that, but it doesn't always work as although the rich text editor inserts <br> and <p> tags, it appears that these are not always represented by the &#x0a; value.

请帮忙 - 这让我发疯.代码:

Please help - this is driving me nuts. Code:

<xsl:choose>
  <xsl:when test="contains(Story, '&#x0a;')">
    <div>PTAG_OPEN_OR_BR<xsl:value-of select="substring-before(Story,'&#x0a;')" disable-output-escaping="yes"/></div>
  </xsl:when>
  <xsl:when test="contains(Story, '&#x0a;') and contains(Story, 'div>')">
    <div>DTAG<xsl:value-of select="substring-before(substring-after(substring-before(Story, '/div>'), 'div>'),'&#x0a;')" disable-output-escaping="yes"/></div>
  </xsl:when>
  <xsl:when test="contains(Story, '&#x0a;')!='True' and contains(Story, 'br>')">
    <div>BRTAG<xsl:value-of select="substring(Story, 1, string-length(substring-before(Story, 'br>')-1))" disable-output-escaping="yes"/></div>
  </xsl:when>            
  <xsl:otherwise>
    <div>NO_TAG<xsl:value-of select="substring(Story, 1, 150)" disable-output-escaping="yes"/></div>
  </xsl:otherwise>
</xsl:choose>

将尝试您的建议 Tomalak.谢谢.

Will try out your suggestion Tomalak. Thank you.

12/11/09

只有机会尝试一下.感谢您的帮助 Tomalak - 我有一个关于将其呈现为 html 而不是 xml 的问题.当我调用模板 removeMarkup 时,我收到以下错误消息:

Only just had chance to try this out. Thanks for your help Tomalak - I have one question in regard to rendering this as html rather than xml. when I call the template removeMarkup, I get the following error message:

异常:System.Xml.XmlException消息:'<',十六进制值 0x3C,是一个无效的属性字符.第 120 行,位置 58.

Exception: System.Xml.XmlException Message: '<', hexadecimal value 0x3C, is an invalid attribute character. Line 120, position 58.

我不确定,但我相信这是因为您不能在其他属性中使用 xslt 标签?有没有办法解决这个问题?

I'm not sure but I believe that this is because you can't have xslt tags inside other attributes? Is there any way around this?

谢谢提姆

推荐答案


很可能由 表示<p>
由编辑器,而不是 &#x0a;.;-)

A <p> or <br> is very probably represented by a <p> or <br> by the editor, not by &#x0a;. ;-)

HTML 中的任何地方都不需要换行符,所以如果编辑器决定不包含任何换行符,也没关系.恕我直言,依赖换行符是您的错误.

Line break characters are not required anywhere in HTML, so if the editor decides not to include any line breaks, it's still fine. Relying on line breaks is an error on your part, IMHO.

除此之外,如果没有示例 XML,任何人都无法猜测 XPath 可能会为您做什么.

Apart from that, without sample XML it is anybody's guess what XPath might do the trick for you.

我建议使用一个模板来从字符串中删除任何 HTML 标记(通过递归字符串处理).然后您可以从结果中取出第一个有意义的文本并将其打印出来.

I suggest a template that removes any HTML markup from a string (by recursive string processing). Then you can take the first meaningful bit of text from the result and print it out.

使用此输入:

<test>
  <Story>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</Story>
  <Story>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</Story>
  <Story>The quick brown fox jumped over the lazy dog.&lt;br&gt;The quick brown fox jumped over the lazy dog.</Story>
  <Story>The quick brown fox jumped over the lazy dog.</Story>
</test>

和这个样式表:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="xml" encoding="utf-8" />

  <xsl:template match="Story">
    <xsl:copy>
      <original>
        <xsl:value-of select="." />
      </original>
      <processed>
        <xsl:variable name="result">
          <xsl:call-template name="removeMarkup">
            <xsl:with-param name="html" select="." />
          </xsl:call-template>
        </xsl:variable>
        <!-- select the bit of text before the '<>' delimiter -->
        <xsl:value-of select="substring-before($result, '&lt;&gt;')" />
      </processed>
    </xsl:copy>
  </xsl:template>

  <!-- this template removes all HTML markup (tags) from a string -->
  <xsl:template name="removeMarkup">
    <xsl:param name="html"  select="''" />
    <xsl:param name="inTag" select="false()" />

    <!-- if we are in a tag, we look for the next '>', otherwise for '<' -->    
    <xsl:variable name="lookFor">
      <xsl:choose>
        <xsl:when test="$inTag">&gt;</xsl:when>
        <xsl:otherwise>&lt;</xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <!-- split the input at the current delimiter char -->
    <xsl:variable name="head" select="substring-before(concat($html, '&lt;'), $lookFor)" />
    <xsl:variable name="tail" select="substring-after($html, $lookFor)" />

    <xsl:if test="not($inTag)">
      <xsl:value-of select="$head" />
      <!-- now add a uniqe delimiter after the first actual text -->
      <xsl:if test="translate(normalize-space($head), ' ', '') != ''">
        <xsl:value-of select="'&lt;&gt;'" /> <!-- '<>' as a delimiter -->
      </xsl:if>
    </xsl:if>

    <!-- remove markup for the rest of the string -->
    <xsl:if test="$tail != ''">
      <xsl:call-template name="removeMarkup">
        <xsl:with-param name="html"  select="$tail" />
        <xsl:with-param name="inTag" select="not($inTag)" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

产生以下结果:

<Story>
  <original>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</original>
  <processed>The quick brown fox jumped over the lazy dog</processed>
</Story>
<Story>
  <original>&lt;div&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;p&gt;The quick brown fox jumped over the lazy dog&lt;/p&gt;&lt;/div&gt;</original>
  <processed>The quick brown fox jumped over the lazy dog</processed>
</Story>
<Story>
  <original>The quick brown fox jumped over the lazy dog.&lt;br&gt;The quick brown fox jumped over the lazy dog.</original>
  <processed>The quick brown fox jumped over the lazy dog.</processed>
</Story>
<Story>
  <original>The quick brown fox jumped over the lazy dog.</original>
  <processed>The quick brown fox jumped over the lazy dog.</processed>
</Story>

免责声明:与 HTML 输入的所有字符串处理一样,这并非 100% 万无一失,某些格式错误的输入可能会破坏它.

Disclaimer: As with all string processing over HTML input, this is not 100% fool proof and certain malformed input can break it.

这篇关于如何使用 XSLT 1.0 或 XPath 来操作 HTML 字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆