通用XML到CSV转换 [英] Generic XML to CSV conversion

查看:131
本文介绍了通用XML到CSV转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将动态XML转换为CSV。我搜索各种选项来实现这一点,但没有找到合适的答案。



XML的结构是动态的 - 它可以是产品数据,地理数据或任何这样的东西。所以,我不能使用预定义的XSL或脚本转换。



标记名称应该形成CSV的标题。
例如:

 < Ctry> 
< datarow>
< CtryName> Ctry1< / CtryName>
< CtryID> 12361< / CtryID>
< State>
< datarow>
< StateName> State1< / StateName>
< StateID> 12361< / StateID>
< City>
< datarow>
< CityName> City1< / CityName>
< CityID> 12361< / CityID>
< / datarow>
< / City>
< / datarow>
< datarow>
< StateName> State2< / StateName>
< StateID> 12361< / StateID>
< / datarow>
< / State>
< / datarow>
< / Ctry>

CSV应如下所示:

 标题:CtryName CtryId StateName StateId CityName CityID 
Row1:Ctry1 12361 State1 12361 City1 12361
Row2:Ctry1 12361 State2 12361



您能否推荐使用apt来解决这个问题?

解决方案

下面是一个记录,说明执行这样的转换的通用样式表。样式表做的唯一假设是元素< datarows> 。给定的结构意味着基于所请求的结果使用子元素:



数据:

  T:\ftemp> type xml2csv.xml 
< Ctry>
< datarow>
< CtryName> Ctry1< / CtryName>
< Ctry ID> 12361< / CtryID>
< State>
< datarow>
< StateName> State1< / StateName>
< StateID> 12361< / StateID>
<城市>
< datarow>
< CityName> City1< / CityName>
< CityID> 12361< / CityID>
< / datarow>
< / City>
< / datarow>
< datarow>
< StateName> State2< / StateName>
< StateID> 12361< / StateID>
< / datarow>
< / State>
< / datarow>
< / Ctry>

执行:

 code> T:\ftemp> call xslt2 xml2csv.xml xml2csv.xsl 
CtryName,CtryID,StateName,StateID,CityName,CityID
Ctry1,12361,State1,12361,City1,12361
Ctry1,12361,State2,12361

样式表:

  T:\ftemp> type xml2csv.xsl 
<?xml version =1.0encoding =US-ASCII?&
< xsl:stylesheet xmlns:xsl =http://www.w3.org/1999/XSL/Transform
version =2.0>

< xsl:output method =text/>

< xsl:variable name =fields
select =distinct-values(// datarow / * [not(*)] / name(。))/>

< xsl:template match =/>
<! - header row - >
< xsl:value-of select =$ fieldsseparator =,/>

<! - body - >
< xsl:apply-templates select =*/>

<! - final line terminator - >
< xsl:text>& #xa;< / xsl:text>
< / xsl:template>

<! - 元素只处理元素,而不是文本 - >
< xsl:template match =*>
< xsl:apply-templates select =*/>
< / xsl:template>

<! - 这些元素是CSV字段 - >
< xsl:template match =datarow / * [not(*)]>
<! - 如果必要,复制祖先 - >
< xsl:if test =position()= 1 and ../preceding-sibling::datarow\">
< xsl:for-each select =ancestor :: datarow [position()> 1] / * [not(*)]>
< xsl:call-template name =doThisField/>
< / xsl:for-each>
< / xsl:if>
< xsl:call-template name =doThisField/>
< / xsl:template>

<! - 输出一个结束前一个字段并转义内容的字段 - >
< xsl:template name =doThisField>
< xsl:choose>
< xsl:when test =name(。)= $ fields [1]>
<! - 上一行终止符 - >
< xsl:text>& #xa;< / xsl:text>
< / xsl:when>
< xsl:otherwise>
<! - 上一个字段终止符 - >
< xsl:text>,< / xsl:text>
< / xsl:otherwise>
< / xsl:choose>
<! - 根据RFC4180转义的字段值 - >
< xsl:choose>
包含(。,',')或
包含(。,'& xa;')>
< xsl:text>< / xsl:text>
< xsl:value-of select =replace(。,'&#x22;','&#x22; &#x22;')/>
< xsl:text>< / xsl:text>
< / xsl:when>
< xsl:otherwise>< xsl:value -of select =。/>< / xsl:otherwise>
< / xsl:choose>
< / xsl:template>

< / xsl:stylesheet>

请注意,上述代码会根据RFC4180转义各个字段。



我的个人资料有一个链接到我的网站,你会发现一个免费的XML资源的目录,包括一个XSLT样式表将RFC4180 CSV文件转换为XML文件。



根据原始海报的要求,这是一个XSLT 1.0解决方案:

  t:\ftemp> type xml2csv1.xsl 
<?xml version =1.0encoding =US-ASCII?>
< xsl:stylesheet xmlns:xsl =http://www.w3.org/1999/XSL/Transform
version =1.0>

< xsl:output method =text/>

< xsl:variable name =firstFieldName
select =name((// datarow / * [not(*)])[1]

< xsl:key name =namesmatch =datarow / * [not(*)]use =name(。)/>

< xsl:template match =/>
<! - header row - >
< xsl:for-each select =// datarow / * [not(*)]
[generate-id(。)=
generate-id ,name(。))[1])]>
< xsl:if test =position()> 1>,< / xsl:if>
< xsl:value-of select =name(。)/>
< / xsl:for-each>

<! - body - >
< xsl:apply-templates select =*/>

<! - final line terminator - >
< xsl:text>& #xa;< / xsl:text>
< / xsl:template>

<! - 元素只处理元素,而不是文本 - >
< xsl:template match =*>
< xsl:apply-templates select =*/>
< / xsl:template>

<! - 这些元素是CSV字段 - >
< xsl:template match =datarow / * [not(*)]>
<! - 如果必要,复制祖先 - >
< xsl:if test =position()= 1 and ../preceding-sibling::datarow\">
< xsl:for-each select =ancestor :: datarow [position()> 1] / * [not(*)]>
< xsl:call-template name =doThisField/>
< / xsl:for-each>
< / xsl:if>
< xsl:call-template name =doThisField/>
< / xsl:template>

<! - 输出一个结束前一个字段并转义内容的字段 - >
< xsl:template name =doThisField>
< xsl:choose>
< xsl:when test =name(。)= $ firstFieldName>
<! - 上一行终止符 - >
< xsl:text>& #xa;< / xsl:text>
< / xsl:when>
< xsl:other>
<! - 上一个字段终止符 - >
< xsl:text>,< / xsl:text>
< / xsl:otherwise>
< / xsl:choose>
<! - 根据RFC4180转义的字段值 - >
< xsl:choose>
包含(。,',')或
包含(。,'& xa;')>
< xsl:text>< / xsl:text>
< xsl:call-template name =escapeQuote/>
< xsl:text> / xsl:text>
< / xsl:when>
< xsl:otherwise>< xsl:value -of select =。/>< / xsl:otherwise>
< / xsl:choose>
< / xsl:template>

<! - 使用两个双引号转义当前节点值的双引号 - >
< xsl:template name =escapeQuote>
< xsl:param name =restselect =。/>
< xsl:choose>
< xsl:when test =contains($ rest,'&#x22;')>
< xsl:value-of select =substring-before($ rest,'&#x22;')/>
< xsl:text>< / xsl:text>
< xsl:call-template name =escapeQuote>
< xsl:with-param name =restselect =substring-after($ rest,'&#x22;')/>
< / xsl:call-template>
< / xsl:when>
< xsl:otherwise>
< xsl:value-of select =$ rest/>
< / xsl:otherwise>
< / xsl:choose>
< / xsl:template>

< / xsl:stylesheet>


I am trying to convert a dynamic XML to CSV. I searched for various options to achieve this but did not find a suitable answer.

The structure of the XML is dynamic - It can be a product data, a geography data or any such thing. So, I am not able to use predefined XSL or castor conversion.

The tag names should form the header of the CSV. For example :

<Ctry>
  <datarow>
     <CtryName>Ctry1</CtryName>
     <CtryID>12361</CtryID>
    <State>
      <datarow>
         <StateName>State1</StateName>
         <StateID>12361</StateID>
        <City>
           <datarow>
              <CityName>City1</CityName>
               <CityID>12361</CityID>
           </datarow>
        </City>
      </datarow>
      <datarow>
         <StateName>State2</StateName>
         <StateID>12361</StateID>
      </datarow>
      </State>
  </datarow>
</Ctry>

The CSV should look like :

Header: CtryName   CtryId     StateName  StateId     CityName   CityID
Row1:   Ctry1       12361     State1     12361       City1      12361
Row2:   Ctry1       12361     State2     12361  

Could you please recommend the apt thing to use to address this problem?

解决方案

Below is a transcript illustrating the execution of a generic stylesheet to do such conversion. The only assumption made by the stylesheet is the element <datarows>. The structure given implies the use of child elements based on the requested results:

Data:

  T:\ftemp>type xml2csv.xml 
  <Ctry>
    <datarow>
       <CtryName>Ctry1</CtryName>
       <CtryID>12361</CtryID>
      <State>
        <datarow>
           <StateName>State1</StateName>
           <StateID>12361</StateID>
          <City>
             <datarow>
                <CityName>City1</CityName>
                 <CityID>12361</CityID>
             </datarow>
          </City>
        </datarow>
        <datarow>
           <StateName>State2</StateName>
           <StateID>12361</StateID>
        </datarow>
        </State>
    </datarow>
  </Ctry>

Execution:

  T:\ftemp>call xslt2 xml2csv.xml xml2csv.xsl 
  CtryName,CtryID,StateName,StateID,CityName,CityID
  Ctry1,12361,State1,12361,City1,12361
  Ctry1,12361,State2,12361

Stylesheet:

  T:\ftemp>type xml2csv.xsl 
  <?xml version="1.0" encoding="US-ASCII"?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                  version="2.0">

  <xsl:output method="text"/>

  <xsl:variable name="fields" 
                select="distinct-values(//datarow/*[not(*)]/name(.))"/>

  <xsl:template match="/">
    <!--header row-->
    <xsl:value-of select="$fields" separator=","/>

    <!--body-->
    <xsl:apply-templates select="*"/>

    <!--final line terminator-->
    <xsl:text>&#xa;</xsl:text>
  </xsl:template>

  <!--elements only process elements, not text-->
  <xsl:template match="*">
    <xsl:apply-templates select="*"/>
  </xsl:template>

  <!--these elements are CSV fields-->
  <xsl:template match="datarow/*[not(*)]">
    <!--replicate ancestors if necessary-->
    <xsl:if test="position()=1 and ../preceding-sibling::datarow">
      <xsl:for-each select="ancestor::datarow[position()>1]/*[not(*)]">
        <xsl:call-template name="doThisField"/>
      </xsl:for-each>
    </xsl:if>
    <xsl:call-template name="doThisField"/>
  </xsl:template>

  <!--put out a field ending the previous field and escaping content-->
  <xsl:template name="doThisField">
    <xsl:choose>
      <xsl:when test="name(.)=$fields[1]">
        <!--previous line terminator-->
        <xsl:text>&#xa;</xsl:text>
      </xsl:when>
      <xsl:otherwise>
        <!--previous field terminator-->
        <xsl:text>,</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
    <!--field value escaped per RFC4180-->
    <xsl:choose>
      <xsl:when test="contains(.,'&#x22;') or 
                      contains(.,',') or
                      contains(.,'&#xa;')">
        <xsl:text>"</xsl:text>
        <xsl:value-of select="replace(.,'&#x22;','&#x22;&#x22;')"/>
        <xsl:text>"</xsl:text>
      </xsl:when>
      <xsl:otherwise><xsl:value-of select="."/></xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  </xsl:stylesheet>

Note that the above code escapes the individual fields per RFC4180.

My profile has a link to my web site where you will find a directory of free XML resources including an XSLT stylesheet to convert RFC4180 CSV files into XML files.

This is an XSLT 1.0 solution to the answer, as requested by the original poster:

t:\ftemp>type xml2csv1.xsl 
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

<xsl:output method="text"/>

<xsl:variable name="firstFieldName" 
              select="name((//datarow/*[not(*)])[1])"/>

<xsl:key name="names" match="datarow/*[not(*)]" use="name(.)"/>

<xsl:template match="/">
  <!--header row-->
  <xsl:for-each select="//datarow/*[not(*)]
                        [generate-id(.)=
                         generate-id(key('names',name(.))[1])]">
    <xsl:if test="position()>1">,</xsl:if>
    <xsl:value-of select="name(.)"/>
  </xsl:for-each>

  <!--body-->
  <xsl:apply-templates select="*"/>

  <!--final line terminator-->
  <xsl:text>&#xa;</xsl:text>
</xsl:template>

<!--elements only process elements, not text-->
<xsl:template match="*">
  <xsl:apply-templates select="*"/>
</xsl:template>

<!--these elements are CSV fields-->
<xsl:template match="datarow/*[not(*)]">
  <!--replicate ancestors if necessary-->
  <xsl:if test="position()=1 and ../preceding-sibling::datarow">
    <xsl:for-each select="ancestor::datarow[position()>1]/*[not(*)]">
      <xsl:call-template name="doThisField"/>
    </xsl:for-each>
  </xsl:if>
  <xsl:call-template name="doThisField"/>
</xsl:template>

<!--put out a field ending the previous field and escaping content-->
<xsl:template name="doThisField">
  <xsl:choose>
    <xsl:when test="name(.)=$firstFieldName">
      <!--previous line terminator-->
      <xsl:text>&#xa;</xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <!--previous field terminator-->
      <xsl:text>,</xsl:text>
    </xsl:otherwise>
  </xsl:choose>
  <!--field value escaped per RFC4180-->
  <xsl:choose>
    <xsl:when test="contains(.,'&#x22;') or 
                    contains(.,',') or
                    contains(.,'&#xa;')">
      <xsl:text>"</xsl:text>
      <xsl:call-template name="escapeQuote"/>
      <xsl:text>"</xsl:text>
    </xsl:when>
    <xsl:otherwise><xsl:value-of select="."/></xsl:otherwise>
  </xsl:choose>
</xsl:template>

<!--escape a double quote in the current node value with two double quotes-->
<xsl:template name="escapeQuote">
  <xsl:param name="rest" select="."/>
  <xsl:choose>
    <xsl:when test="contains($rest,'&#x22;')">
      <xsl:value-of select="substring-before($rest,'&#x22;')"/>
      <xsl:text>""</xsl:text>
      <xsl:call-template name="escapeQuote">
        <xsl:with-param name="rest" select="substring-after($rest,'&#x22;')"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$rest"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

</xsl:stylesheet>

这篇关于通用XML到CSV转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆