将 XML 转换为纯文本 - 我应该如何忽略/处理 XSLT 中的空格? [英] Converting XML to plain text - how should I ignore/handle whitespace in the XSLT?

查看:19
本文介绍了将 XML 转换为纯文本 - 我应该如何忽略/处理 XSLT 中的空格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 XSLT 将 XML 文件转换为 dokuwiki 使用的标记.这实际上在某种程度上有效,但 XSL 文件中的缩进被插入到结果中.目前,我有两个选择:完全放弃这个 XSLT 的东西,并找到另一种从 XML 转换为 dokuwiki 标记的方法,或者从 XSL 文件中删除大约 95% 的空白,使其几乎不可读并成为维护的噩梦.

I'm trying to convert an XML file into the markup used by dokuwiki, using XSLT. This actually works to some degree, but the indentation in the XSL file is getting inserted into the results. At the moment, I have two choices: abandon this XSLT thing entirely, and find another way to convert from XML to dokuwiki markup, or delete about 95% of the whitespace from the XSL file, making it nigh-unreadable and a maintenance nightmare.

有什么方法可以在 XSL 文件中保留缩进而不将所有空白传递给最终文档?

Is there some way to keep the indentation in the XSL file without passing all that whitespace on to the final document?

背景:我正在将一个 autodoc 工具从静态 HTML 页面迁移到 dokuwiki,因此每当应用程序团队遇到文档记录不佳的代码时,应用程序团队都可以进一步记录由服务器团队开发的 API.逻辑是为 autodoc 工具留出每个页面的一部分,并允许在此块之外的任何地方发表评论.我使用 XSLT 是因为我们已经有了从 XML 转换为 XHTML 的 XSL 文件,而且我假设重写 XSL 比从头开始我自己的解决方案更快.

Background: I'm migrating an autodoc tool from static HTML pages over to dokuwiki, so the API developed by the server team can be further documented by the applications team whenever the apps team runs into poorly-documented code. The logic is to have a section of each page set aside for the autodoc tool, and to allow comments anywhere outside this block. I'm using XSLT because we already have the XSL file to convert from XML to XHTML, and I'm assuming it will be faster to rewrite the XSL than to roll my own solution from scratch.

啊,对了,愚蠢的我,我忽略了缩进属性.(其他背景说明:我是 XSLT 的新手.)另一方面,我仍然需要处理换行符.Dokuwiki 使用管道来区分表格列,这意味着表格行中的所有数据必须在一行上.有没有办法抑制输出换行符(只是偶尔),这样我就可以以某种可读的方式为每个表格单元格做一些相当复杂的逻辑?

推荐答案

在 XSLT 转换的结果中出现不需要的空白的三个原因:

There are three reasons for getting unwanted whitespace in the result of an XSLT transformation:

  1. 来自源文档中节点之间的空白
  2. 来自源文档中节点内的空白
  3. 来自样式表的空白

我将讨论所有三个,因为很难判断空格从何而来,因此您可能需要使用多种策略.

I'm going to talk about all three because it can be hard to tell where whitespace comes from so you might need to use several strategies.

要解决源文档中节点之间的空白,您应该使用 去除出现在两个节点之间的任何空白,然后使用 以保留可能出现在混合内容中的重要空白.例如,如果您的源文档如下所示:

To address the whitespace that is between nodes in your source document, you should use <xsl:strip-space> to strip out any whitespace that appears between two nodes, and then use <xsl:preserve-space> to preserve the significant whitespace that might appear within mixed content. For example, if your source document looks like:

<ul>
  <li>This is an <strong>important</strong> <em>point</em></li>
</ul>

那么您将要忽略

  • 之间以及
  • 之间的空格code> 和

    then you will want to ignore the whitespace between the <ul> and the <li> and between the </li> and the </ul>, which is not significant, but preserve the whitespace between the <strong> and <em> elements, which is significant (otherwise you'd get "This is an **important***point*"). To do this use

    <xsl:strip-space elements="*" />
    <xsl:preserve-space elements="li" />
    

    上的 elements 属性基本上应该列出文档中包含混合内容的所有元素.

    The elements attribute on <xsl:preserve-space> should basically list all the elements in your document that have mixed content.

    旁白:使用 还减少了内存中源树的大小,并使您的样式表更高效,因此即使您没有这类空白问题.

    Aside: using <xsl:strip-space> also reduces the size of the source tree in memory, and makes your stylesheet more efficient, so it's worth doing even if you don't have whitespace problems of this sort.

    要解决源文档中节点内出现的空白,您应该使用normalize-space().例如,如果您有:

    To address the whitespace that appears within nodes in your source document, you should use normalize-space(). For example, if you have:

    <dt>
      a definition
    </dt>
    

    并且您可以确定 <dt> 元素不会包含您想要对其进行处理的任何元素,然后您可以这样做:

    and you can be sure that the <dt> element won't hold any elements that you want to do something with, then you can do:

    <xsl:template match="dt">
      ...
      <xsl:value-of select="normalize-space(.)" />
      ...
    </xsl:template>
    

    开头和结尾的空格将从 <dt> 元素的值中去除,您将得到字符串 "a definition".

    The leading and trailing whitespace will be stripped from the value of the <dt> element and you will just get the string "a definition".

    要解决来自样式表的空白(这可能就是您遇到的问题),就是当您在模板中包含这样的文本时:

    To address whitespace coming from the stylesheet, which is perhaps the one you're experiencing, is when you have text within a template like this:

    <xsl:template match="name">
      Name:
      <xsl:value-of select="." />
    </xsl:template>
    

    XSLT 样式表的解析方式与它们处理的源文档相同,因此上述 XSLT 被解释为一棵树,其中包含一个带有 元素match 属性,其第一个子节点是文本节点,第二个子节点是带有 select 属性的 元素.文本节点有前后空格(包括换行符);因为它是样式表中的文字文本,所以它会被逐字复制到结果中,包括所有前导和尾随空格.

    XSLT stylesheets are parsed in the same way as the source documents that they process, so the above XSLT is interpreted as a tree that holds an <xsl:template> element with a match attribute whose first child is a text node and whose second child is a <xsl:value-of> element with a select attribute. The text node has leading and trailing whitespace (including line breaks); since it's literal text in the stylesheet, it gets literally copied over into the result, with all the leading and trailing whitespace.

    但是一些 XSLT 样式表中的空白会被自动去除,即节点之间的空白.您的结果中没有换行符,因为 <xsl:value-of> 的结束之间有一个换行符>.

    But some whitespace in XSLT stylesheets get stripped automatically, namely those between nodes. You don't get a line break in your result because there's a line break between the <xsl:value-of> and the close of the <xsl:template>.

    要在结果中只获取您想要的文本,请使用 <xsl:text> 元素,如下所示:

    To get only the text you want in the result, use the <xsl:text> element like this:

    <xsl:template match="name">
      <xsl:text>Name: </xsl:text>
      <xsl:value-of select="." />
    </xsl:template>
    

    XSLT 处理器将忽略节点之间出现的换行符和缩进,只输出 元素内的文本.

    The XSLT processor will ignore the line breaks and indentation that appear between nodes, and only output the text within the <xsl:text> element.

    这篇关于将 XML 转换为纯文本 - 我应该如何忽略/处理 XSLT 中的空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆