使用 XSLT 合并来自多个 xml 文件的数据而不重复 [英] Merging data from multiple xml files without duplicates using XSLT

查看:23
本文介绍了使用 XSLT 合并来自多个 xml 文件的数据而不重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人问过类似的问题,我已经阅读了这些问题,并尝试使用教程来解决这个问题,但没有解决.我确定这是编写正确的 xpath 的问题,但我似乎无法弄清楚.我正在尝试获取文件列表(基本上是文件夹中的所有内容)并将它们组合成不同的模式格式.诀窍在于,需要将来自各个文件的部分信息用作结果 XML 中的查找表.我的解决方案需要纯粹是 XSLT1.0.不用说,下面的所有内容都是虚构的......除了清单"xml文件的结构可能如下所示:

Similar questions have been asked, and I've read them and tried to figure this out using tutorials and what not, but haven't been able to. I am sure it's a matter of writing the correct xpath, but I can't seem to figure it out. I'm trying to take a list of files (basically everything in a folder) and combine them into a different schema format. The trick is that part of the information from the individual files needs to be used as a lookup table in the resulting XML. My solution needs to be purely XSLT1.0. It probably goes without saying that everything below is fictional...except maybe the structure of the "manifest" xml file that looks like the following:

<files>
    <file>request1.xml</xml>
    <file>request2.xml</xml>
    <file>request3.xml</xml>
<files>

Request1.xml 文件可能如下所示:

Request1.xml file might look like the following:

<?xml version="1.0" encoding="UTF-8"?>
<ProductList xmlns:pl="http://products.produsor.com/pml" xmlns:pi="http://standards.product.produsor.com/pml" createDateTime="2014-05-06T18:13:51.0Z" version="5.0">
    <pl:Request requestId="ADF87A9DF7" quantity="1">
        <pl:SystemIdentifier name="GUID">38DDF5C1-A049-44DB-9EEA-3F5CB831228D</pl:SystemIdentifier>
        <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
        <pl:Product>
            <pl:Names>
                <pi.ProductNameLongDescription>Classic Design Round Dinning Table</pi.ProductNameLongDescription>
                <pi.ProductNameShort>Dinning Table</pi.ProductNameShort>
            </pl:Names>
            <pl:Description>
                <pi.ProductLongDescription>This is a really awesome table.</pi.ProductLongDescription>
                <pi.ProductShortDescription>It's made of wood</pi.ProductShortDescription>
            </pl:Description>
            <pl:Category>
                <pl:Name>Table</pl:Name>
                <pl:Description>This category is for tables</pl:Description>
                <pl:Priority>1</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Dinning Furniture</pl:Name>
                <pl:Description>This category is for Dinning Furniture</pl:Description>
                <pl:Priority>2</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Wood Furniture</pl:Name>
                <pl:Description>This category is for Wood Furniture</pl:Description>
                <pl:Priority>3</pl:Priority>
            </pl:Category>
        </pl:Product>
    </pl:Request>
    <pl:Request requestId="DA7FDAFDA9" quanitity="1">
        <pl:SystemIdentifier name="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
        <pl:SystemIdentifier name="UPC">4236483269</pl:SystemIdentifier>
        <pl:Product>
            <pl:Names>
                <pi.ProductNameLongDescription>Classic Design Round Coffee Table</pi.ProductNameLongDescription>
                <pi.ProductNameShort>Coffee Table</pi.ProductNameShort>
            </pl:Names>
            <pl:Description>
                <pi.ProductLongDescription>This is a really awesome table.</pi.ProductLongDescription>
                <pi.ProductShortDescription>It is made of wood</pi.ProductShortDescription>
            </pl:Description>
            <pl:Category>
                <pl:Name>Table</pl:Name>
                <pl:Description>This category is for tables</pl:Description>
                <pl:Priority>1</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Living Room Furniture</pl:Name>
                <pl:Description>This category is for Dinning Furniture</pl:Description>
                <pl:Priority>4</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Wood Furniture</pl:Name>
                <pl:Description>This category is for Wood Furniture</pl:Description>
                <pl:Priority>3</pl:Priority>
            </pl:Category>
        </pl:Product>
    </pl:Request>
</ProductList>

Request2.xml 应该是这样的:

And Request2.xml would be something like this:

<?xml version="1.0" encoding="UTF-8"?>
<ProductList xmlns:pl="http://products.produsor.com/pml" xmlns:pi="http://standards.product.produsor.com/pml" createDateTime="2014-05-06T18:13:51.0Z" version="5.0">
    <pl:Request requestId="DFADF08D0A" quantity="10">
        <pl:SystemIdentifier name="GUID">38DDF5C1-A049-44DB-9EEA-3F5CB831228D</pl:SystemIdentifier>
        <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
        <pl:Product>
            <pl:Names>
                <pi.ProductNameLongDescription>Classic Design Round Dinning Table</pi.ProductNameLongDescription>
                <pi.ProductNameShort>Dinning Table</pi.ProductNameShort>
            </pl:Names>
            <pl:Description>
                <pi.ProductLongDescription>This is a really awesome table.</pi.ProductLongDescription>
                <pi.ProductShortDescription>It's made of wood</pi.ProductShortDescription>
            </pl:Description>
            <pl:Category>
                <pl:Name>Table</pl:Name>
                <pl:Description>This category is for tables</pl:Description>
                <pl:Priority>1</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Dinning Furniture</pl:Name>
                <pl:Description>This category is for Dinning Furniture</pl:Description>
                <pl:Priority>2</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Wood Furniture</pl:Name>
                <pl:Description>This category is for Wood Furniture</pl:Description>
                <pl:Priority>3</pl:Priority>
            </pl:Category>
        </pl:Product>
    </pl:Request>
    <pl:Request requestId="RER7689EQ9" quanitity="10">
        <pl:SystemIdentifier name="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
        <pl:SystemIdentifier name="UPC">4236483269</pl:SystemIdentifier>
        <pl:Product>
            <pl:Names>
                <pi.ProductNameLongDescription>Classic Design Round Coffee Table</pi.ProductNameLongDescription>
                <pi.ProductNameShort>Coffee Table</pi.ProductNameShort>
            </pl:Names>
            <pl:Description>
                <pi.ProductLongDescription>This is a really awesome table.</pi.ProductLongDescription>
                <pi.ProductShortDescription>It is made of wood</pi.ProductShortDescription>
            </pl:Description>
            <pl:Category>
                <pl:Name>Table</pl:Name>
                <pl:Description>This category is for tables</pl:Description>
                <pl:Priority>1</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Living Room Furniture</pl:Name>
                <pl:Description>This category is for Dinning Furniture</pl:Description>
                <pl:Priority>4</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Wood Furniture</pl:Name>
                <pl:Description>This category is for Wood Furniture</pl:Description>
                <pl:Priority>3</pl:Priority>
            </pl:Category>
        </pl:Product>
    </pl:Request>
</ProductList>

我想要的是以下内容:

<ProductList xmlns:pl="http://products.produsor.com/pml">
    <pl:Submission>
<!--********* This is the problem area *************-->
        <pl:Descriptions>
            <pl:Description id="1">This is a really awesome table.</pl:Description>
        </pl:Descriptions>
        <pl:Categories>
            <pl:Category id="1">Table</pl:Category>
            <pl:Category id="2">Dinning Furniture</pl:Category>
            <pl:Category id="3">Living Room Furniture</pl:Category>
            <pl:Category id="4">Wood Furniture</pl:Category>
        </pl:Categories>
<!--****************************************************-->
        <pl:Product>
            <pl:SystemIdentifier type="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
            <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
            <pl:ProductName descriptionId="1">Dinning Table</pl:ProductName>
            <cat catId="1"/>
            <cat catId="2"/>
            <cat catId="3"/>
        </pl:Product>
        <pl:Product>
            <pl:SystemIdentifier type="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
            <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
            <pl:ProductName descriptionId="1">Dinning Table</pl:ProductName>
            <cat catId="1"/>
            <cat catId="3"/>
            <cat catId="4"/>
        </pl:Product>       
        <pl:Product>
            <pl:SystemIdentifier type="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
            <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
            <pl:ProductName descriptionId="1">Dinning Table</pl:ProductName>
            <cat catId="1"/>
            <cat catId="2"/>
            <cat catId="3"/>
        </pl:Product>
        <pl:Product>
            <pl:SystemIdentifier type="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
            <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
            <pl:ProductName descriptionId="1">Dinning Table</pl:ProductName>
            <cat catId="1"/>
            <cat catId="3"/>
            <cat catId="4"/>
        </pl:Product>   
    </pl:Submission>
</ProductList>

诀窍是我不能在 pl:Descriptionpl:category 标签中有重复的值.如果产品元素在文件中重复,则要求产品元素重复.我构建了 xslt 模板来构建所有内容,包括描述和类别,但它为每个文件都这样做.我需要它构建描述和类别一次,包括来自所有文件的不同数据,然后是所有产品元素.这是我目前所拥有的,它构建了产品元素.

The trick is that I can't have repeating values in the pl:Description or the pl:category tags. It is required that the product elements repeat if they are repeated in the files. I have the xslt templates built to construct everything, including the descriptions and categories, but it does it for each file. I need it build the descriptions and categories once including the distinct data from all of the files and then all of the product elements. Here is what I have so far, which builds the product elements.

<xsl:template match="/">
    <xsl:for-each select="/files/file">
        <xsl:apply-templates select="document(.)/ProductList/pl:Request"/>
    </xsl:for-each>
</xsl:template>

由于这已经很长了,我只想说请求模板用于创建产品元素,我有一个ProductList"模板将创建描述和类别元素结构.

Since this is pretty long already, I'll just say that the request template works to create the product elements and I have a "ProductList" template which will create the descriptions and categories element structure.

推荐答案

这里是一个例子,将所有类别复制到一个结果树片段中,使用exsl:node-set然后Muenchian分组来识别唯一类别,然后在复制请求元素时引用它们:

Here is an example that copies all categories into a result tree fragment, uses exsl:node-set and then Muenchian grouping to identify unique categories and then references them when copying the request elements:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common"
    xmlns:pl="http://products.produsor.com/pml"
    version="1.0"
    exclude-result-prefixes="exsl">

    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="input-docs" select="document(files/file)"/>

    <xsl:variable name="cats-rtf">
        <xsl:copy-of select="$input-docs//pl:Category"/>
    </xsl:variable>

    <xsl:key name="group" match="pl:Category" use="pl:Name"/>

    <xsl:variable name="distinct-cats-rtf">
        <xsl:for-each select="exsl:node-set($cats-rtf)/pl:Category[generate-id() = generate-id(key('group', pl:Name)[1])]">
            <pl:Category id="{position()}">
                <xsl:value-of select="pl:Name"/>
            </pl:Category>
        </xsl:for-each>
    </xsl:variable>

    <xsl:variable name="distinct-cats" select="exsl:node-set($distinct-cats-rtf)/pl:Category"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/">
        <ProductList>
            <pl:Submission>
                <pl:Categories>
                    <xsl:copy-of select="$distinct-cats"/>
                </pl:Categories>
                <xsl:apply-templates select="$input-docs//pl:Request"/>
            </pl:Submission>
        </ProductList>
    </xsl:template>

    <xsl:template match="pl:Category">
        <cat catId="{$distinct-cats[. = current()/pl:Name]/@id}"/>
    </xsl:template>

</xsl:stylesheet>

您可以使用相同的方法来识别独特的描述并引用它们.

You could use the same approach to identify the unique descriptions and reference them.

这篇关于使用 XSLT 合并来自多个 xml 文件的数据而不重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆