使用 XSLT 将分隔文本转换为 XML [英] Use XSLT to convert delimited text to XML

查看:22
本文介绍了使用 XSLT 将分隔文本转换为 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一些 XML 标签中有一些双管道分隔数据,我想将分隔文本替换/转换为 XML.
分隔文本也使用冒号分隔标题和数据,如下所示: ||tagname:data||
标题或标签名称可以是任何东西,这只是一个例子.所以我事先不知道我得到了什么.我必须把冒号前面列出的东西用起来.

I have some double pipe delimited data inside some XML tags and I would like to replace/convert the delimited text to XML.
The delimited text also uses a colon to separate the heading and the data, like so: ||tagname:data||
The headings or tag names could be anything, this is just one example. So I don't know in advance what I'm getting. I must take what's listed in front of the colon and use that.

 <doc>
      <arr name="content">
      <str>  stream_source_info docname   stream_content_type text/html   stream_size 412   Content-Encoding ISO-8859-1   stream_name docname   Content-Type text/html; charset=ISO-8859-1   resourceName docname       ||phone:3282||email:Lori.KS@.edu||officenumber:D-107A||vcard:https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b||photo:https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846||pronunciation:https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846||  </str>
    </arr>
</doc>  

我可以使用 XSLT 将这个 XML 转换成这个吗?

Can I use XSLT to transform this XML into this?

 <doc>
      <arr name="content">
      <str>  stream_source_info docname   stream_content_type text/html   stream_size 412   Content-Encoding ISO-8859-1   stream_name docname   Content-Type text/html; charset=ISO-8859-1   resourceName docname  
          <phone>3282</phone>
          <email>Lori.KS@.edu</email>
          <officenumber>D-107A</officenumber>
          <vcard>https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b</vcard>
          <photo>https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846</photo>
          <pronunciation>https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846</pronunciation>
      </str>
    </arr>
</doc>  

URL 必须包含在 CDATA 中,并且必须替换分隔版本.
有人可以指出我正确的方向吗?谢谢,

The URLs will have to be wrapped in CDATA and the delimited version will have to be replaced.
Can someone point me in the right direction? Thank you,

推荐答案

analyze-string 可以提供帮助,Saxon 9.5 的样式表

analyze-string can help, with Saxon 9.5 the stylesheet

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output indent="yes"/>

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="str">
  <xsl:copy>
    <xsl:analyze-string select="." regex="\|((\|[^|]+\|)+)\|">
      <xsl:matching-substring>
        <xsl:analyze-string select="regex-group(1)" regex="\|(\w+):([^|]+)\|">
          <xsl:matching-substring>
            <xsl:element name="{regex-group(1)}">
              <xsl:value-of select="regex-group(2)"/>
            </xsl:element>
          </xsl:matching-substring>
        </xsl:analyze-string>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

转换输入

<doc>
      <arr name="content">
      <str>  stream_source_info docname   stream_content_type text/html   stream_size 412   Content-Encoding ISO-8859-1   stream_name docname   Content-Type text/html; charset=ISO-8859-1   resourceName docname       ||phone:3282||email:Lori.KS@.edu||officenumber:D-107A||vcard:https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b||photo:https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846||pronunciation:https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846||  </str>
    </arr>
</doc>

进入结果

<doc>
      <arr name="content">
      <str>  stream_source_info docname   stream_content_type text/html   stream_size 412   Content-Encoding ISO-8859-1   stream_name docname   Content-Type text/html; charset=ISO-8859-1   resourceName docname       <phone>3282</phone>
         <email>Lori.KS@.edu</email>
         <officenumber>D-107A</officenumber>
         <vcard>https://c3qa/profiles/vcard/profile.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b</vcard>
         <photo>https://c3qa/profiles/photo.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846</photo>
         <pronunciation>https://c3qa/profiles/audio.do?key=5c28d263-d8aa-4a8a-ae90-4e8b13de7a0b&amp;lastMod=1348674215846</pronunciation>  
      </str>
    </arr>
</doc>

这篇关于使用 XSLT 将分隔文本转换为 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆