XSL - 从文本文件创建格式良好的 xml [英] XSL - create well formed xml from text file
问题描述
我有一个管道分隔的文本文件,如下所示,我需要使用 xsl 将其转换为格式良好的 xml 结构(如下所示的示例).下面的 xsl 是我解决这个问题的(最新)尝试 - 但是我似乎无法找到一种方法将 002 级元素封装在 001 级中,即在逐行遍历文件时保持父子关系.有人可以帮忙吗?
I have a pipe delimited text file as shown below, which I need to transform into a well formed xml structure (example shown below) using xsl. The xsl below is my (latest) attempt at solving this - however I cannot seem to find a way to encapsulate the level 002 elements in level 001, i.e. maintain the parent-child relationship, when iterating through the file line by line. Could anyone help here ?
管道分隔文件 - 输入
Pipe delimited file - input
001|XXX|YYY
002|AAA|BBB
002|CCC|DD
001|EEF|XXX
002|HHH|GGG
XML 文件 - 所需的输出
XML File - desired output
<root>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">AAA</elem>
<elem name="field3">BBB</elem>
</level002>
<level002>
<elem name="field1">002</elem>
<elem name="field2">CCC</elem>
<elem name="field3">DD</elem>
</level002>
</level001>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">HHH</elem>
<elem name="field3">GG</elem>
</level002>
</level001>
</root>
当前的 XSL
<xsl:variable name="Cols">
<col>field1,1</col>
<col>field2,2</col>
<col>field3,3</col>
</xsl:variable>
<xsl:template match="/" name="main">
<xsl:choose>
<xsl:when test="unparsed-text-available($pathToCSV, $encoding)">
<xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />
<xsl:variable name="lines" select="tokenize($csv, '\n')" as="xs:string+" />
<root>
<xsl:for-each select="$lines[position() > 0]">
<xsl:if test="translate(., '  	 ', '') != ''">
<level001>
<xsl:variable name="line" select="." />
<xsl:variable name="columns" select="tokenize(.,'\|')" as="xs:string+"/>
<xsl:choose>
<xsl:when test="$columns[1]='001'">
<xsl:for-each select="$Cols/col">
<xsl:variable name="column" select="number(substring-after(.,','))"/>
<elem name="{substring-before(.,',')}">
<!-- trims the whitespace from the beginning and the ending of the value -->
<xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
</elem>
</xsl:for-each>
</xsl:when>
<xsl:when test="$columns[1]='002'">
<level002>
<xsl:for-each select="$Cols/col">
<xsl:variable name="column" select="number(substring-after(.,','))"/>
<elem name="{substring-before(.,',')}">
<!-- trims the whitespace from the beginning and the ending of the value -->
<xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
</elem>
</xsl:for-each>
</level002>
</xsl:when>
</xsl:choose>
</level001>
</xsl:if>
</xsl:for-each>
</root>
</xsl:when>
</xsl:choose>
推荐答案
我会首先将平面文本转换为平面 XML 结构,然后使用 for-each-group group-starting-with
,如以下代码示例所示:
I would first transform the flat text into a flat XML structure and then group that with for-each-group group-starting-with
, as in the following code sample:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="mf xs"
version="2.0">
<xsl:param name="text-url" as="xs:string" select="'test2012090401.txt'"/>
<xsl:param name="sep" as="xs:string" select="'\|'"/>
<xsl:param name="field" as="xs:string" select="'field'"/>
<xsl:output indent="yes"/>
<xsl:function name="mf:group" as="node()*">
<xsl:param name="nodes" as="node()*"/>
<xsl:param name="level" as="xs:integer"/>
<xsl:for-each-group select="$nodes" group-starting-with="line[xs:integer(elem[1]) eq $level]">
<xsl:element name="level{*[1]}">
<xsl:copy-of select="*"/>
<xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
</xsl:element>
</xsl:for-each-group>
</xsl:function>
<xsl:template name="main">
<xsl:variable name="flat">
<xsl:for-each select="tokenize(unparsed-text($text-url), '\r?\n')">
<line>
<xsl:for-each select="tokenize(., $sep)">
<elem name="{$field}{position()}">
<xsl:value-of select="."/>
</elem>
</xsl:for-each>
</line>
</xsl:for-each>
</xsl:variable>
<root>
<xsl:sequence select="mf:group($flat/line, 1)"/>
</root>
</xsl:template>
</xsl:stylesheet>
当我使用 java -jar saxon9he.jar -it:main -xsl:sheet.xsl
在 Saxon 9 中应用该样式表时,我得到的结果是
When I apply that stylesheet with Saxon 9 using java -jar saxon9he.jar -it:main -xsl:sheet.xsl
, the result I get is
<?xml version="1.0" encoding="UTF-8"?>
<root>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">AAA</elem>
<elem name="field3">BBB</elem>
</level002>
<level002>
<elem name="field1">002</elem>
<elem name="field2">CCC</elem>
<elem name="field3">DD</elem>
</level002>
</level001>
<level001>
<elem name="field1">001</elem>
<elem name="field2">EEF</elem>
<elem name="field3">XXX</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">HHH</elem>
<elem name="field3">GGG</elem>
<level/>
</level002>
</level001>
</root>
样式表有一个名为 text-url
的参数,指向运行样式表时可以设置的纯文本文件.
The stylesheet has a parameter named text-url
to the plain text file you can set when running the stylesheet.
这篇关于XSL - 从文本文件创建格式良好的 xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!