如何将XML文件压缩为一组xpath表达式? [英] How to flatten an XML file into a set of xpath expressions?

查看:60
本文介绍了如何将XML文件压缩为一组xpath表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑我有以下示例XML文件:

Consider I have the following example XML file:

<ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'>
   <article xmlns:ns1='http://predic8.com/material/1/'>
      <name xmlns:ns1='http://predic8.com/material/1/'>foo</name>
      <description xmlns:ns1='http://predic8.com/material/1/'>bar</description>
      <price xmlns:ns1='http://predic8.com/common/1/'>
         <amount xmlns:ns1='http://predic8.com/common/1/'>00.00</amount>
         <currency xmlns:ns1='http://predic8.com/common/1/'>USD</currency>
      </price>
      <id xmlns:ns1='http://predic8.com/material/1/'>1</id>
   </article>
</ns1:create>

将此变为一组xpath表达式的最佳(最有效)方法是什么。
另请注意:我想忽略任何名称空间和属性信息。 (如果需要,这也可以作为预处理步骤完成。)

What would be the best (most efficient) way to flatten this into a set of xpath expressions. Note also: I want to ignore any namespace and attribute information. (If needed, this could also be done as a pre-processing step).

所以我想得到输出:

/create/article/name
/create/article/description
/create/article/price/amount
/create/article/price/currency
/create/article/id

我用Java实现。

编辑:
PS,我可能还需要这个才能在文本节点没有数据的情况下工作,所以例如,以下内容应该生成与上面相同的输出:

PS, I might also need this to work in the case that there is no data at the text node, so for example, this following should generate the same output as the above:

<ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'>
  <article xmlns:ns1='http://predic8.com/material/1/'>
    <name />
    <description />
    <price xmlns:ns1='http://predic8.com/common/1/'>
      <amount />
      <currency xmlns:ns1='http://predic8.com/common/1/'></currency>
    </price>
    <id xmlns:ns1='http://predic8.com/material/1/'></id>
  </article>
</ns1:create>


推荐答案

您可以使用XSLT轻松完成此任务。看看你的例子,看起来你只想要包含文本的元素的XPath。如果不是这样,请告诉我,我可以更新XSLT。

You could do this pretty easily with XSLT. Looking at your examples, it seems like you only want the XPath of elements that contain text. If that's not the case, let me know and I can update the XSLT.

我创建了一个新的输入示例,以显示它如何处理具有相同名称的兄弟姐妹。在这种情况下,< article>

I created a new input example to show how it handles siblings with the same name. In this case, <article>.

XML输入

<ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'>
    <article xmlns:ns1='http://predic8.com/material/1/'>
        <name xmlns:ns1='http://predic8.com/material/1/'>foo</name>
        <description xmlns:ns1='http://predic8.com/material/1/'>bar</description>
        <price xmlns:ns1='http://predic8.com/common/1/'>
            <amount xmlns:ns1='http://predic8.com/common/1/'>00.00</amount>
            <currency xmlns:ns1='http://predic8.com/common/1/'>USD</currency>
        </price>
        <id xmlns:ns1='http://predic8.com/material/1/'>1</id>
    </article>
    <article xmlns:ns1='http://predic8.com/material/2/'>
        <name xmlns:ns1='http://predic8.com/material/2/'>some name</name>
        <description xmlns:ns1='http://predic8.com/material/2/'>some description</description>
        <price xmlns:ns1='http://predic8.com/common/2/'>
            <amount xmlns:ns1='http://predic8.com/common/2/'>00.01</amount>
            <currency xmlns:ns1='http://predic8.com/common/2/'>USD</currency>
        </price>
        <id xmlns:ns1='http://predic8.com/material/2/'>2</id>
    </article>
</ns1:create>

XSLT 1.0

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="text()"/>

    <xsl:template match="*[text()]">
        <xsl:call-template name="genPath"/>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:template>

    <xsl:template name="genPath">
        <xsl:param name="prevPath"/>
        <xsl:variable name="currPath" select="concat('/',local-name(),'[',
        count(preceding-sibling::*[name() = name(current())])+1,']',$prevPath)"/>
        <xsl:for-each select="parent::*">
            <xsl:call-template name="genPath">
                <xsl:with-param name="prevPath" select="$currPath"/>
            </xsl:call-template>
        </xsl:for-each>
        <xsl:if test="not(parent::*)">
            <xsl:value-of select="$currPath"/>
            <xsl:text>&#xA;</xsl:text>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

输出

/create[1]/article[1]/name[1]
/create[1]/article[1]/description[1]
/create[1]/article[1]/price[1]/amount[1]
/create[1]/article[1]/price[1]/currency[1]
/create[1]/article[1]/id[1]
/create[1]/article[2]/name[1]
/create[1]/article[2]/description[1]
/create[1]/article[2]/price[1]/amount[1]
/create[1]/article[2]/price[1]/currency[1]
/create[1]/article[2]/id[1]






更新

要使XSLT适用于所有元素,只需删除 [text ()] 谓词来自 match =* [text()]。这将输出每个元素的路径。如果您不希望包含其他元素(如create,article和price)的元素的路径输出,请添加谓词 [not(*)] 。以下是更新的示例:

For the XSLT to work for all elements, simply remove the [text()] predicate from match="*[text()]". This will output the path for every element. If you don't want the path output for elements that contain other elements (like create, article, and price) add the predicate [not(*)]. Here's an updated example:

新XML输入

<ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'>
    <article xmlns:ns1='http://predic8.com/material/1/'>
        <name />
        <description />
        <price xmlns:ns1='http://predic8.com/common/1/'>
            <amount />
            <currency xmlns:ns1='http://predic8.com/common/1/'></currency>
        </price>
        <id xmlns:ns1='http://predic8.com/material/1/'></id>
    </article>
    <article xmlns:ns1='http://predic8.com/material/2/'>
        <name xmlns:ns1='http://predic8.com/material/2/'>some name</name>
        <description xmlns:ns1='http://predic8.com/material/2/'>some description</description>
        <price xmlns:ns1='http://predic8.com/common/2/'>
            <amount xmlns:ns1='http://predic8.com/common/2/'>00.01</amount>
            <currency xmlns:ns1='http://predic8.com/common/2/'>USD</currency>
        </price>
        <id xmlns:ns1='http://predic8.com/material/2/'>2</id>
    </article>
</ns1:create>

XSLT 1.0

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="text()"/>

    <xsl:template match="*[not(*)]">
        <xsl:call-template name="genPath"/>
        <xsl:apply-templates select="node()"/>
    </xsl:template>

    <xsl:template name="genPath">
        <xsl:param name="prevPath"/>
        <xsl:variable name="currPath" select="concat('/',local-name(),'[',
            count(preceding-sibling::*[name() = name(current())])+1,']',$prevPath)"/>
        <xsl:for-each select="parent::*">
            <xsl:call-template name="genPath">
                <xsl:with-param name="prevPath" select="$currPath"/>
            </xsl:call-template>
        </xsl:for-each>
        <xsl:if test="not(parent::*)">
            <xsl:value-of select="$currPath"/>
            <xsl:text>&#xA;</xsl:text>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet>

输出

/create[1]/article[1]/name[1]
/create[1]/article[1]/description[1]
/create[1]/article[1]/price[1]/amount[1]
/create[1]/article[1]/price[1]/currency[1]
/create[1]/article[1]/id[1]
/create[1]/article[2]/name[1]
/create[1]/article[2]/description[1]
/create[1]/article[2]/price[1]/amount[1]
/create[1]/article[2]/price[1]/currency[1]
/create[1]/article[2]/id[1]

如果删除 [not(*)] 谓词,这是输出的样子(为每个元素输出一个路径):

If you remove the [not(*)] predicate, this is what the output looks like (a path is output for every element):

/create[1]
/create[1]/article[1]
/create[1]/article[1]/name[1]
/create[1]/article[1]/description[1]
/create[1]/article[1]/price[1]
/create[1]/article[1]/price[1]/amount[1]
/create[1]/article[1]/price[1]/currency[1]
/create[1]/article[1]/id[1]
/create[1]/article[2]
/create[1]/article[2]/name[1]
/create[1]/article[2]/description[1]
/create[1]/article[2]/price[1]
/create[1]/article[2]/price[1]/amount[1]
/create[1]/article[2]/price[1]/currency[1]
/create[1]/article[2]/id[1]

这是另一个版本的XSLT大约快65%:

Here's another version of the XSLT that is about 65% faster:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="text()"/>

    <xsl:template match="*[not(*)]">
        <xsl:for-each select="ancestor-or-self::*">
            <xsl:value-of select="concat('/',local-name(),'[',count(preceding-sibling::*[local-name()=local-name(current())])+1,']')"/>
        </xsl:for-each>
        <xsl:text>&#xA;</xsl:text>
        <xsl:apply-templates select="node()"/>
    </xsl:template>

</xsl:stylesheet>

这篇关于如何将XML文件压缩为一组xpath表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆