将XML转换为纯文本-如何在XSLT中忽略/处理空格? [英] Converting XML to plain text - how should I ignore/handle whitespace in the XSLT?

查看:135
本文介绍了将XML转换为纯文本-如何在XSLT中忽略/处理空格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用XSLT将XML文件转换为dokuwiki使用的标记.实际上,这在某种程度上是可行的,但是XSL文件中的缩进已插入到结果中.目前,我有两种选择:完全放弃XSLT,找到另一种从XML转换为dokuwiki标记的方法,或者从XSL文件中删除大约95%的空格,这几乎是不可读的,并且是维护的噩梦.

I'm trying to convert an XML file into the markup used by dokuwiki, using XSLT. This actually works to some degree, but the indentation in the XSL file is getting inserted into the results. At the moment, I have two choices: abandon this XSLT thing entirely, and find another way to convert from XML to dokuwiki markup, or delete about 95% of the whitespace from the XSL file, making it nigh-unreadable and a maintenance nightmare.

是否有某种方法可以将缩进保留在XSL文件中而不将所有空白传递给最终文档?

Is there some way to keep the indentation in the XSL file without passing all that whitespace on to the final document?

背景:我正在将自动文档工具从静态HTML页面迁移到dokuwiki,因此只要应用程序团队遇到文档不良的代码,服务器团队开发的API就可以由应用程序团队进一步记录.逻辑是将每个页面的一部分留给autodoc工具使用,并允许在此块之外的任何地方添加注释.我之所以使用XSLT,是因为我们已经有了将XSL文件从XML转换为XHTML的功能,并且我认为重写XSL的速度要比从头开始使用自己的解决方案要快.

Background: I'm migrating an autodoc tool from static HTML pages over to dokuwiki, so the API developed by the server team can be further documented by the applications team whenever the apps team runs into poorly-documented code. The logic is to have a section of each page set aside for the autodoc tool, and to allow comments anywhere outside this block. I'm using XSLT because we already have the XSL file to convert from XML to XHTML, and I'm assuming it will be faster to rewrite the XSL than to roll my own solution from scratch.

嗯,对,愚蠢的我,我忽略了indent属性. (其他背景说明:我是XSLT的新手.)另一方面,我仍然需要处理换行符. Dokuwiki使用管道来区分表列,这意味着表行中的所有数据都必须在一行上.有没有一种方法可以抑制换行符的输出(偶尔),因此我可以在某种可读性上为每个表单元格做一些相当复杂的逻辑?

推荐答案

在XSLT转换的结果中获得不需要的空格的三个原因:

There are three reasons for getting unwanted whitespace in the result of an XSLT transformation:

    来自源文档中节点之间的
  1. 空格
  2. 来自源文档节点内的
  3. 空白
  4. 样式表中的
  5. 空格
  1. whitespace that comes from between nodes in the source document
  2. whitespace that comes from within nodes in the source document
  3. whitespace that comes from the stylesheet

我将要讨论这三个问题,因为可能很难分辨出空白来自何处,因此您可能需要使用几种策略.

I'm going to talk about all three because it can be hard to tell where whitespace comes from so you might need to use several strategies.

要解决源文档中节点之间的空白,应使用<xsl:strip-space>去除两个节点之间出现的所有空白,然后使用<xsl:preserve-space>保留可能出现在混合内容中的重要空白.例如,如果您的源文档看起来像:

To address the whitespace that is between nodes in your source document, you should use <xsl:strip-space> to strip out any whitespace that appears between two nodes, and then use <xsl:preserve-space> to preserve the significant whitespace that might appear within mixed content. For example, if your source document looks like:

<ul>
  <li>This is an <strong>important</strong> <em>point</em></li>
</ul>

然后您将要忽略<ul><li>之间以及</li></ul>之间的空格,虽然这并不重要,但是保留了<strong>元素,这些元素很重要(否则,您将获得这是一个**重要***点*").为此使用

then you will want to ignore the whitespace between the <ul> and the <li> and between the </li> and the </ul>, which is not significant, but preserve the whitespace between the <strong> and <em> elements, which is significant (otherwise you'd get "This is an **important***point*"). To do this use

<xsl:strip-space elements="*" />
<xsl:preserve-space elements="li" />

<xsl:preserve-space>上的elements属性应基本上列出文档中具有混合内容的所有元素.

The elements attribute on <xsl:preserve-space> should basically list all the elements in your document that have mixed content.

此外:使用<xsl:strip-space>还会减少内存中源树的大小,并使样式表更高效,因此即使您没有这种空白问题也值得这样做.

Aside: using <xsl:strip-space> also reduces the size of the source tree in memory, and makes your stylesheet more efficient, so it's worth doing even if you don't have whitespace problems of this sort.

要解决源文档中节点内出现的空白,应使用normalize-space().例如,如果您有:

To address the whitespace that appears within nodes in your source document, you should use normalize-space(). For example, if you have:

<dt>
  a definition
</dt>

,并且可以确保<dt>元素不会包含任何您想使用的元素,那么您可以执行以下操作:

and you can be sure that the <dt> element won't hold any elements that you want to do something with, then you can do:

<xsl:template match="dt">
  ...
  <xsl:value-of select="normalize-space(.)" />
  ...
</xsl:template>

前导和尾随空格将从<dt>元素的值中去除,您将只得到字符串"a definition".

The leading and trailing whitespace will be stripped from the value of the <dt> element and you will just get the string "a definition".

要解决样式表中的空白(也许是您遇到的空白),是因为您在模板中包含以下文本:

To address whitespace coming from the stylesheet, which is perhaps the one you're experiencing, is when you have text within a template like this:

<xsl:template match="name">
  Name:
  <xsl:value-of select="." />
</xsl:template>

XSLT样式表的解析方式与处理它们的源文档相同,因此上述XSLT被解释为包含具有<xsl:template>元素的树,该元素具有match属性,该属性的第一个子级是文本节点,并且其第二个子元素是具有select属性的<xsl:value-of>元素.文本节点具有前导和尾随空格(包括换行符);由于它是样式表中的文字文本,因此会将其原样复制到结果中,并带有所有前导和尾随空格.

XSLT stylesheets are parsed in the same way as the source documents that they process, so the above XSLT is interpreted as a tree that holds an <xsl:template> element with a match attribute whose first child is a text node and whose second child is a <xsl:value-of> element with a select attribute. The text node has leading and trailing whitespace (including line breaks); since it's literal text in the stylesheet, it gets literally copied over into the result, with all the leading and trailing whitespace.

但是XSLT样式表中的 some 空格会自动删除,即节点之间的空格.您不会在结果中出现换行符,因为<xsl:value-of><xsl:template>的收盘之间存在换行符.

But some whitespace in XSLT stylesheets get stripped automatically, namely those between nodes. You don't get a line break in your result because there's a line break between the <xsl:value-of> and the close of the <xsl:template>.

要仅获取结果中所需的文本,请使用<xsl:text>元素,如下所示:

To get only the text you want in the result, use the <xsl:text> element like this:

<xsl:template match="name">
  <xsl:text>Name: </xsl:text>
  <xsl:value-of select="." />
</xsl:template>

XSLT处理器将忽略节点之间出现的换行符和缩进,仅输出<xsl:text>元素内的文本.

The XSLT processor will ignore the line breaks and indentation that appear between nodes, and only output the text within the <xsl:text> element.

这篇关于将XML转换为纯文本-如何在XSLT中忽略/处理空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆