XML到XSLT的CSV - 分组节点 [英] XML to CSV with XSLT - Grouping nodes

查看:147
本文介绍了XML到XSLT的CSV - 分组节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个XML文件,我从一个php curl响应生成,然后转换为CSV,下面的每个mods元素是一行。我已在此处的选中答案中使用样式表获得一些CSV,但不是




$ b

我的XML(简化):

c $ c>< xml>
< mods xmlns =http://www.loc.gov/mods/>
< typeOfResource> StillImage< / typeOfResource>
< titleInfo ID =T-1>
< title>东湾街< / title>
< / titleInfo>
< subject ID =SBJ-2>
< topic>铁路< / topic>
< / subject>
< subject ID =SBJ-3>
< geographic>低国家< / geographic>
< / subject>
< subject ID =SBJ-4>
< geographic> Charleston(S.C。)< / geographic>
< / subject>
< subject ID =SBJ-7>
< hierarchicalGeographic>
< county>查尔斯顿县(S.C。)< / county>
< / hierarchyGeographic>
< / subject>
< physicalDescription>
< form>图片< / form>
< / physicalDescription>
< note> Caption:&''War Views。 No.179。东北铁路车站,查尔斯顿的废墟。这是一个立体图像,测量3 1/2& X 7&。日期假定为1865.< / note>
< originInfo>
< dateCreated> 1865< / dateCreated>
< / originInfo>
< location>
< physicalLocation>查尔斯顿博物馆档案< / physicalLocation>
< / location>
< relatedItem type =host>
< titleInfo>
< title>查尔斯顿博物馆内战照片< / title>
< / titleInfo>
< / relatedItem>
< / mods>

< mods>
more nodes ...
< / mods>
< / xml>

我当前的XSL来自上面的堆栈文件:

 < xsl:stylesheet version =1.0
xmlns:xsl =http://www.w3.org/1999/XSL/Transform>
< xsl:output method =textencoding =iso-8859-1/>

< xsl:strip-space elements =*/>

< xsl:template match =/ * / child :: *>
< xsl:for-each select =child :: *>
< xsl:if test =position()!= last()>< xsl:value -of select =normalize-space(。)/> ;,< / xsl:if> ;
< xsl:if test =position()= last()>< xsl:value-of select =normalize-space(。)/> < xsl:text>& #xD;< / xsl:text>
< / xsl:if>
< / xsl:for-each>
< / xsl:template>

< / xsl:stylesheet>

这将输出CSV,其中每个MODS元素都是一行,每个子元素是逗号分隔值线。是否可以修改XSL,使得每个MODS元素是一行,但是匹配子项的值分组?例如:

  StillImage,East Bay Street,Railroads,** Low County; Charleston(SC)**,Charleston County SC),图片

.....等等。



当节点(如多个主题 - >地理条目)匹配时,它们被分组和分号分离,而不是占用多个逗号分隔值?希望我有一些意义。谢谢!

解决方案

一种方法是首先更改您的XSLT只选择没有前置兄弟(即选择每个组中的第一个元素)

 < xsl:for-each select =* [name(*)!= name(preceding-sibling :: * [1] / *)]> 

然后,您可以定义一个变量来获取以下同级,如果

 < xsl: variable name =nextWithSameName
select =following-sibling :: * [1] [name(*)= name(current()/ *)]/>
< xsl:if test =$ nextWithSameName> **< / xsl:if>

(我不知道你是否真的想要最终结果中的**,只是为了突出显示该组!我保持他们在我的例子,但显然它会很容易删除相关的代码行)。



可以为同名的第一个后面的兄弟姐妹调用递归模板

 < xsl:apply- templates select =$ nextWithSameNamemode =group/> 

然后,在这个模板中,你可以递归调用它的下一个同级的同名



 < xsl:template match =*mode =group> 
< xsl:text> ;;< / xsl:text>
< xsl:value-of select =normalize-space(。)/>
< xsl:apply-templates select =following-sibling :: * [1] [name(*)= name(current()/ *)]/>
< / xsl:template>

尝试以下XSLT

 < xsl:stylesheet version =1.0xmlns:xsl =http://www.w3.org/1999/XSL/Transform> 
< xsl:output method =textencoding =iso-8859-1/>
< xsl:strip-space elements =*/>

< xsl:template match =/ * / *>
< xsl:for-each select =* [name(*)!= name(preceding-sibling :: * [1] / *)]>
< xsl:variable name =nextWithSameNameselect =following-sibling :: * [1] [name(*)= name(current()/ *)]/>
< xsl:if test =position()& gt; 1>,< / xsl:if>
< xsl:if test =$ nextWithSameName> **< / xsl:if>
< xsl:value-of select =normalize-space(。)/>
< xsl:apply-templates select =$ nextWithSameNamemode =group/>
< xsl:if test =$ nextWithSameName> **< / xsl:if>
< / xsl:for-each>
< xsl:text>& #xD;< / xsl:text>
< / xsl:template>

< xsl:template match =*mode =group>
< xsl:text> ;;< / xsl:text>
< xsl:value-of select =normalize-space(。)/>
< xsl:apply-templates select =following-sibling :: * [1] [name(*)= name(current()/ *)]/>
< / xsl:template>
< / xsl:stylesheet>

现在,如果你可以使用XSLT 2.0,事情会变得更容易,因为你可以使用 xsl:for-each-group 构造,其中包括group-adjacent操作。此外,您也可以使用改进的 xsl:value-of 取消递归模板,当选择多个元素时,将使用separator属性。



对于XSLT 2.0,以下内容也应该工作

  xsl:stylesheet version =2.0xmlns:xsl =http://www.w3.org/1999/XSL/Transform> 
< xsl:output method =textencoding =iso-8859-1/>
< xsl:strip-space elements =*/>

< xsl:template match =/ * / *>
< xsl:for-each-group select =*group-adjacent =name(*)>
< xsl:if test =position()& gt; 1>,< / xsl:if>
< xsl:if test =current-group()[2]> **< / xsl:if>
< xsl:value-of select =current-group()separator =; />
< xsl:if test =current-group()[2]> **< / xsl:if>
< / xsl:for-each-group>
< xsl:text>& #xD;< / xsl:text>
< / xsl:template>
< / xsl:stylesheet>


So I've got an XML file I've generated from a php curl response that is then transformed to CSV such that each mods element below is one line. I've got some CSV using the stylesheet in the checked answer here , but it's not quite what I'm trying to do.

My XML (simplified):

<xml>
<mods xmlns="http://www.loc.gov/mods/">
      <typeOfResource>StillImage</typeOfResource>
      <titleInfo ID="T-1">
        <title>East Bay Street</title>
      </titleInfo>
      <subject ID="SBJ-2">
        <topic>Railroads</topic>
      </subject>
      <subject ID="SBJ-3">
        <geographic>Low Country</geographic>
      </subject>
      <subject ID="SBJ-4">
        <geographic>Charleston (S.C.)</geographic>
      </subject>
      <subject ID="SBJ-7">
        <hierarchicalGeographic>
          <county>Charleston County (S.C.)</county>
        </hierarchicalGeographic>
      </subject>
      <physicalDescription>
        <form>Images</form>
      </physicalDescription>
      <note>Caption: &apos;War Views. No.179.  Ruins of the Northeastern Railway Depot, Charleston.&apos;  This is a stereograph image which measures 3 1/2&quot; X 7&quot;.  Date assumed to be 1865.</note>
      <originInfo>
        <dateCreated>1865</dateCreated>
      </originInfo>
      <location>
        <physicalLocation>The Charleston Museum Archives</physicalLocation>
      </location>
      <relatedItem type="host">
        <titleInfo>
          <title>Charleston Museum Civil War Photographs</title>
        </titleInfo>
      </relatedItem>
    </mods>

   <mods>
     more nodes...
   </mods>
</xml>

My current XSL from the stack post above:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="iso-8859-1"/>

<xsl:strip-space elements="*" />

<xsl:template match="/*/child::*">
<xsl:for-each select="child::*">
<xsl:if test="position() != last()"><xsl:value-of select="normalize-space(.)"/>,        </xsl:if>
<xsl:if test="position()  = last()"><xsl:value-of select="normalize-space(.)"/>    <xsl:text>&#xD;</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>

 </xsl:stylesheet>

This outputs CSV where each MODS element is one line, and each child is a comma separated value on that line. Would it be possible to modify the XSL such that each MODS element is one line, but the values of matching children are grouped? Something like:

StillImage,East Bay Street,Railroads,**Low County;Charleston (S.C.)**,Charleston County (S.C.), Images

.......and so on.

So when nodes (like the multiple subject -> geographic entries) match they are grouped and semicolon separated rather than taking up multiple comma separated values? Hopefully I'm making some sense. Thanks!

解决方案

One way to do this is firstly change your XSLT to only select the elements which do not have a preceding-sibling with the same child name (i.e select elements that are the 'first' in each group)

<xsl:for-each select="*[name(*) != name(preceding-sibling::*[1]/*)]">

Then, you can define a variable to get the following sibling if (and only if) it has the same name, so you can then check if the current element is indeed in a group of more than 1.

<xsl:variable name="nextWithSameName" 
              select="following-sibling::*[1][name(*)=name(current()/*)]"/>
<xsl:if test="$nextWithSameName">**</xsl:if>

(I am not sure if you actually wanted the ** in the final results, or whether they are just there to highlight the group! I am keeping them in my example, but obviously it will be easy enough to remove the relevant lines of code).

To group together the following-siblings with the same name, you could call a recursive template for the first following-sibling

<xsl:apply-templates select="$nextWithSameName" mode="group"/>

Then, within this template you would recursively call it where the immediate following sibling has the same name

<xsl:template match="*" mode="group">
   <xsl:text>;</xsl:text>
   <xsl:value-of select="normalize-space(.)"/>
   <xsl:apply-templates select="following-sibling::*[1][name(*)=name(current()/*)]" />
</xsl:template>

Try the following XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="text" encoding="iso-8859-1"/>
   <xsl:strip-space elements="*"/>

   <xsl:template match="/*/*">
      <xsl:for-each select="*[name(*) != name(preceding-sibling::*[1]/*)]">
         <xsl:variable name="nextWithSameName" select="following-sibling::*[1][name(*)=name(current()/*)]"/>
         <xsl:if test="position() &gt; 1">,    </xsl:if>
         <xsl:if test="$nextWithSameName">**</xsl:if>
         <xsl:value-of select="normalize-space(.)"/>
         <xsl:apply-templates select="$nextWithSameName" mode="group"/>
         <xsl:if test="$nextWithSameName">**</xsl:if>
      </xsl:for-each>
      <xsl:text>&#xD;</xsl:text>
   </xsl:template>

   <xsl:template match="*" mode="group">
      <xsl:text>;</xsl:text>
      <xsl:value-of select="normalize-space(.)"/>
      <xsl:apply-templates select="following-sibling::*[1][name(*)=name(current()/*)]" />
   </xsl:template>
</xsl:stylesheet>

Now, if you could use XSLT 2.0, things become much, much easier, as you could use the xsl:for-each-group construct which, among other things, comes with an operation to 'group-adjacent'. And you could also do away with the recursive template by using the improved xsl:value-of which would have a 'separator' property to use when multiple elements are select.

For XSLT 2.0, the following should also work

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="iso-8859-1"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/*/*">
        <xsl:for-each-group select="*" group-adjacent="name(*)">
            <xsl:if test="position() &gt; 1">,    </xsl:if>
            <xsl:if test="current-group()[2]">**</xsl:if>
            <xsl:value-of select="current-group()" separator=";" />
            <xsl:if test="current-group()[2]">**</xsl:if>
        </xsl:for-each-group >
        <xsl:text>&#xD;</xsl:text>
    </xsl:template>
</xsl:stylesheet>

这篇关于XML到XSLT的CSV - 分组节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆