递归地对任意XML文档的元素进行排序 [英] Sort elements of arbitrary XML document recursively

查看:84
本文介绍了递归地对任意XML文档的元素进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图对一些XML文档进行排序和规范化。期望的最终结果是:
$ b $ ol

  • 每个元素的孩子都按字母顺序排列

  • 每个元素属性都按字母顺序排列

  • 注释已被移除

  • 所有元素都有适当的间隔(即漂亮打印 / li>

    除了#1,我已经实现了所有这些目标。一直使用这个答案作为我的模板。这是我到目前为止:

      import javax.xml.transform.stream.StreamResult 
    import javax.xml .transform.stream.StreamSource
    import javax.xml.transform.TransformerFactory
    import org.apache.xml.security.c14n.Canonicalizer

    //初始化安全库
    org.apache.xml.security.Init.init()

    //创建一些变量

    //获取参数

    //确保所需的参数已经提供

    if(!error){
    //创建一些变量
    def ext = fileInName.tokenize('。')。last()
    fileOutName = fileOutName?:$ {fileInName.lastIndexOf('。')。with {it!= -1?fileInName [0 ..< it]:fileInName}} _ CANONICALIZED_AND_SORTED。$ {ext}
    def fileIn = new File(fileInName)
    def fileOut = new File(fileOutName)
    def xsltFile = new File(xsltName)
    def temp1 = new File(./temp1)
    def temp2 = new File(./ temp2)
    def os
    def is

    //对XML属性进行排序,删除注释以及删除多余的空格
    printlnCanonicalizing ...
    Canonicalizer c = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS)
    os = temp1.newOutputStream ()
    c.setWriter(os)
    c.canonicalize(fileIn.getBytes())
    os.close()

    //对XML元素进行排序
    printlnSorting ...
    def factory = TransformerFactory.newInstance()
    is = xsltFile.newInputStream()
    def transformer = factory.newTransformer(new StreamSource(is))
    is.close()
    is = temp1.newInputStream()
    os = temp2.​​newOutputStream()
    transform.transform(新的StreamSource(是),新的StreamResult(os) )
    is.close()
    os.close()

    //在漂亮打印中写入XML输出
    println美化...
    def parser = new XmlParser()
    def printer = new XmlNodePrinter(new IndentPrinter(fileOut.newPrintWriter(), ),true))
    printer.print parser.parseText(temp2.​​getText())

    //清理
    temp1.delete()
    temp2.​​delete ()

    println完成!
    }

    完整脚本为
    $ b XSLT: b

     < xsl:stylesheet version =1.0xmlns:xsl =http://www.w3.org/1999/XSL/Transform> 
    < xsl:template match =node()| @ *>
    < xsl:copy>
    < xsl:apply-templates select =node()| @ */>
    < / xsl:copy>
    < / xsl:template>
    < xsl:template match =foo>
    < foo>
    < xsl:apply-templates>
    < xsl:sort select =name()/>
    < / xsl:apply-templates>
    < / foo>
    < / xsl:template>
    < / xsl:stylesheet>

    示例输入XML

     < foo b =ba =ac =c> 
    < qwer>
    < zxcv c =cb =b/>
    < vcxz c =cb =b/>
    < / qwer>
    < baz e =ed =d/>
    < bar>
    < fdsa g =gf =f/>
    < asdf g =gf =f/>
    < / bar>
    < / foo>

    期望的输出XML:

     < foo a =ab =bc =c> 
    < bar>
    < asdf f =fg =g/>
    < fdsa f =fg =g/>
    < / bar>
    < baz d =de =e/>
    < qwer>
    < vcxz b =bc =c/>
    < zxcv b =bc =c/>
    < / qwer>
    < / foo>

    如何使转换适用于所有元素,这样所有元素的子元素将按字母顺序排列?

    如果您想将变换应用于所有元素,您需要一个模板匹配所有元素,而不是匹配特定foo元素的模板。

     < xsl:template match = * > 

    请注意,您必须更改当前匹配node()的模板以排除元素:

     < xsl:template match =node()[not(self :: *)] | @ *> ; 

    在此模板中,您还需要代码来选择属性,因为您的foo模板位于(< xsl:apply-templates /> 不会选择属性)。



    实际上,根据您的要求,第1项到第3项都可以使用单个XSLT完成。例如,要删除注释,您可以忽略当前匹配node()的模板。

     < xsl:template匹配= 节点()[不(个体::评论())] [不(个体:: *)] | @ * > 

    尝试以下XSLT,将会达到1至3分

     < xsl:stylesheet version =1.0xmlns:xsl =http://www.w3.org/1999/XSL/Transform> 
    < xsl:output method =xmlindent =yes/>

    < xsl:copy>
    < xsl:apply-templates select =node()| @ */>
    < / xsl:copy>
    < / xsl:template>

    < xsl:template match =*>
    < xsl:copy>
    < xsl:apply-templates select =@ *>
    < xsl:sort select =name()/>
    < / xsl:apply-templates>
    < xsl:apply-templates>
    < xsl:sort select =name()/>
    < / xsl:apply-templates>
    < / xsl:copy>
    < / xsl:template>
    < / xsl:stylesheet>

    编辑:模板< xsl:template match =node [not(self :: comment())] [not(self :: *)] | @ *> 实际上可以替换为< xsl:template match =processing-instruction()| @ *> 这可能会增加可读性。这是因为node()匹配元素,文本节点,注释和处理指令。在您的XSLT中,元素由其他模板拾取,文本节点由内置模板拾取,以及要忽略的注释,只留下处理指令。


    I'm trying to sort and canonicalize some XML documents. The desired end result is that:

    1. every element's children are in alphabetical order
    2. every elements attributes are in alphabetical order
    3. comments are removed
    4. all elements are properly spaced (i.e. "pretty print").

    I have achieved all of these goals except #1.

    I have been using this answer as my template. Here is what I have so far:

    import javax.xml.transform.stream.StreamResult
    import javax.xml.transform.stream.StreamSource
    import javax.xml.transform.TransformerFactory
    import org.apache.xml.security.c14n.Canonicalizer
    
    // Initialize the security library
    org.apache.xml.security.Init.init()
    
    // Create some variables
    
    // Get arguments
    
    // Make sure required arguments have been provided
    
    if(!error) {
        // Create some variables
        def ext = fileInName.tokenize('.').last()
        fileOutName = fileOutName ?: "${fileInName.lastIndexOf('.').with {it != -1 ? fileInName[0..<it] : fileInName}}_CANONICALIZED_AND_SORTED.${ext}"
        def fileIn = new File(fileInName)
        def fileOut = new File(fileOutName)
        def xsltFile = new File(xsltName)
        def temp1 = new File("./temp1")
        def temp2 = new File("./temp2")
        def os
        def is
    
        // Sort the XML attributes, remove comments, and remove extra whitespace
        println "Canonicalizing..."
        Canonicalizer c = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS)
        os = temp1.newOutputStream()
        c.setWriter(os)
        c.canonicalize(fileIn.getBytes())
        os.close()
    
        // Sort the XML elements
        println "Sorting..."
        def factory = TransformerFactory.newInstance()
        is = xsltFile.newInputStream()
        def transformer = factory.newTransformer(new StreamSource(is))
        is.close()
        is = temp1.newInputStream()
        os = temp2.newOutputStream()
        transformer.transform(new StreamSource(is), new StreamResult(os))
        is.close()
        os.close()
    
        // Write the XML output in "pretty print"
        println "Beautifying..."
        def parser = new XmlParser()
        def printer = new XmlNodePrinter(new IndentPrinter(fileOut.newPrintWriter(), "    ", true))
        printer.print parser.parseText(temp2.getText())
    
        // Cleanup
        temp1.delete()
        temp2.delete()
    
        println "Done!"
    }
    

    Full script is here.

    XSLT:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output indent="yes"/>
      <xsl:strip-space elements="*"/>
      <xsl:template match="node()|@*">
        <xsl:copy>
          <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
      </xsl:template>
      <xsl:template match="foo">
        <foo>
          <xsl:apply-templates>
            <xsl:sort select="name()"/>
          </xsl:apply-templates>
        </foo>
      </xsl:template>
    </xsl:stylesheet>
    

    Sample Input XML:

    <foo b="b" a="a" c="c">
        <qwer>
        <zxcv c="c" b="b"/>
        <vcxz c="c" b="b"/>
        </qwer>
        <baz e="e" d="d"/>
        <bar>
        <fdsa g="g" f="f"/>
        <asdf g="g" f="f"/>
        </bar>
    </foo>
    

    Desired Output XML:

    <foo a="a" b="b" c="c">
        <bar>
            <asdf f="f" g="g"/>
            <fdsa f="f" g="g"/>
        </bar>
        <baz d="d" e="e"/>
        <qwer>
            <vcxz b="b" c="c"/>
            <zxcv b="b" c="c"/>
        </qwer>
    </foo>
    

    How can I make the transform apply to all elements so all of an element's children will be in alphabetical order?

    解决方案

    If you want to make the transform apply to all elements, you need a template to match all elements, as opposed to having a template that just matches the specific "foo" element

    <xsl:template match="*">
    

    Note that, you would have to change the current template that matches "node()" to exclude elements:

     <xsl:template match="node()[not(self::*)]|@*">
    

    Within this template, you will also need code to select the attributes, because your "foo" template at the moment will ignore them (<xsl:apply-templates /> does not select attributes).

    Actually, looking at your requirements, items 1 to 3 can all be done with a single XSLT. For example, to remove comments, you could just ignore it from the template that currently matches node()

    <xsl:template match="node()[not(self::comment())][not(self::*)]|@*">
    

    Try the following XSLT, will should achieve points 1 to 3

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" indent="yes"/>
      <xsl:strip-space elements="*"/>
    
      <xsl:template match="node()[not(self::comment())][not(self::*)]|@*">
        <xsl:copy>
          <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="*">
        <xsl:copy>
          <xsl:apply-templates select="@*">
            <xsl:sort select="name()"/>
          </xsl:apply-templates>
          <xsl:apply-templates>
            <xsl:sort select="name()"/>
          </xsl:apply-templates>
        </xsl:copy>
      </xsl:template>
    </xsl:stylesheet>
    

    EDIT: The template <xsl:template match="node()[not(self::comment())][not(self::*)]|@*"> can actually be replaced with just <xsl:template match="processing-instruction()|@*"> which may increase readability. This is because "node()" matches elements, text nodes, comments and processing instructions. In your XSLT, elements are picked up by the other template, text nodes by the built-in template, and comments you want to ignore, leaving just processing instructions.

    这篇关于递归地对任意XML文档的元素进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆