递归地对任意 XML 文档的元素进行排序 [英] Sort elements of arbitrary XML document recursively

查看:14
本文介绍了递归地对任意 XML 文档的元素进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对一些 XML 文档进行排序和规范化.想要的最终结果是:

I'm trying to sort and canonicalize some XML documents. The desired end result is that:

  1. 每个元素的子元素都按字母顺序排列
  2. 每个元素的属性都按字母顺序排列
  3. 删除评论
  4. 所有元素都适当地间隔开(即漂亮的印刷品").

除了#1 之外,我已经实现了所有这些目标.

I have achieved all of these goals except #1.

我一直在使用这个答案作为我的模板.这是我目前所拥有的:

I have been using this answer as my template. Here is what I have so far:

import javax.xml.transform.stream.StreamResult
import javax.xml.transform.stream.StreamSource
import javax.xml.transform.TransformerFactory
import org.apache.xml.security.c14n.Canonicalizer

// Initialize the security library
org.apache.xml.security.Init.init()

// Create some variables

// Get arguments

// Make sure required arguments have been provided

if(!error) {
    // Create some variables
    def ext = fileInName.tokenize('.').last()
    fileOutName = fileOutName ?: "${fileInName.lastIndexOf('.').with {it != -1 ? fileInName[0..<it] : fileInName}}_CANONICALIZED_AND_SORTED.${ext}"
    def fileIn = new File(fileInName)
    def fileOut = new File(fileOutName)
    def xsltFile = new File(xsltName)
    def temp1 = new File("./temp1")
    def temp2 = new File("./temp2")
    def os
    def is

    // Sort the XML attributes, remove comments, and remove extra whitespace
    println "Canonicalizing..."
    Canonicalizer c = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS)
    os = temp1.newOutputStream()
    c.setWriter(os)
    c.canonicalize(fileIn.getBytes())
    os.close()

    // Sort the XML elements
    println "Sorting..."
    def factory = TransformerFactory.newInstance()
    is = xsltFile.newInputStream()
    def transformer = factory.newTransformer(new StreamSource(is))
    is.close()
    is = temp1.newInputStream()
    os = temp2.newOutputStream()
    transformer.transform(new StreamSource(is), new StreamResult(os))
    is.close()
    os.close()

    // Write the XML output in "pretty print"
    println "Beautifying..."
    def parser = new XmlParser()
    def printer = new XmlNodePrinter(new IndentPrinter(fileOut.newPrintWriter(), "    ", true))
    printer.print parser.parseText(temp2.getText())

    // Cleanup
    temp1.delete()
    temp2.delete()

    println "Done!"
}

完整脚本在这里.

XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="foo">
    <foo>
      <xsl:apply-templates>
        <xsl:sort select="name()"/>
      </xsl:apply-templates>
    </foo>
  </xsl:template>
</xsl:stylesheet>

示例输入 XML:

<foo b="b" a="a" c="c">
    <qwer>
    <zxcv c="c" b="b"/>
    <vcxz c="c" b="b"/>
    </qwer>
    <baz e="e" d="d"/>
    <bar>
    <fdsa g="g" f="f"/>
    <asdf g="g" f="f"/>
    </bar>
</foo>

所需的输出 XML:

<foo a="a" b="b" c="c">
    <bar>
        <asdf f="f" g="g"/>
        <fdsa f="f" g="g"/>
    </bar>
    <baz d="d" e="e"/>
    <qwer>
        <vcxz b="b" c="c"/>
        <zxcv b="b" c="c"/>
    </qwer>
</foo>

如何将转换应用于所有元素,以便元素的所有子元素都按字母顺序排列?

How can I make the transform apply to all elements so all of an element's children will be in alphabetical order?

推荐答案

如果你想让变换适用于所有元素,你需要一个模板来匹配所有元素,而不是一个模板只匹配特定的"foo"元素

If you want to make the transform apply to all elements, you need a template to match all elements, as opposed to having a template that just matches the specific "foo" element

<xsl:template match="*">

请注意,您必须更改与node()"匹配的当前模板以排除元素:

Note that, you would have to change the current template that matches "node()" to exclude elements:

 <xsl:template match="node()[not(self::*)]|@*">

在此模板中,您还需要代码来选择属性,因为此时您的foo"模板将忽略它们( 不选择属性).

Within this template, you will also need code to select the attributes, because your "foo" template at the moment will ignore them (<xsl:apply-templates /> does not select attributes).

实际上,根据您的要求,项目 1 到 3 都可以用单个 XSLT 完成.例如,要删除注释,您可以从当前匹配 node() 的模板中忽略它

Actually, looking at your requirements, items 1 to 3 can all be done with a single XSLT. For example, to remove comments, you could just ignore it from the template that currently matches node()

<xsl:template match="node()[not(self::comment())][not(self::*)]|@*">

试试下面的 XSLT,应该能达到 1 到 3 点

Try the following XSLT, will should achieve points 1 to 3

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()[not(self::comment())][not(self::*)]|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="@*">
        <xsl:sort select="name()"/>
      </xsl:apply-templates>
      <xsl:apply-templates>
        <xsl:sort select="name()"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

模板 <xsl:template match="node()[not(self::comment())][not(self::*)]|@*"> 实际上可以仅替换为 ,这可能会增加可读性.这是因为node()"匹配元素、文本节点、注释和处理指令.在您的 XSLT 中,元素由其他模板选取,文本节点由内置模板选取,以及您想忽略的注释,只留下处理指令.

The template <xsl:template match="node()[not(self::comment())][not(self::*)]|@*"> can actually be replaced with just <xsl:template match="processing-instruction()|@*"> which may increase readability. This is because "node()" matches elements, text nodes, comments and processing instructions. In your XSLT, elements are picked up by the other template, text nodes by the built-in template, and comments you want to ignore, leaving just processing instructions.

这篇关于递归地对任意 XML 文档的元素进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆