从TXT文件生成XML格式 [英] Generate XML format from TXT file
本文介绍了从TXT文件生成XML格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在下面有输入txt文件,我正在尝试在下面生成XMl文件.我正在尝试用awk做到这一点,但我想我正在重新发明轮子.您如何建议我这样做?谢谢
I have the input txt file below and I´m trying to generate the XMl file below. I´m trying to make it with awk but I think I´m re-inventing the wheel. How do you suggest me to do it? Thanks
输入txt文件(示例,此输入可能更大)
Input txt file (sample, this input could be bigger)
Usw 1:1 Desktop
Usw 1:2 Netbooks
Usw 1:3 Servers, mainframes and supercomputers
Usw 1:4 Smart devices
Usw 1:5 Embedded devices
Usw 1:6 Gaming
Usw 1:7 Specialized uses
Usw 2:1 Precursors
Usw 2:2 Creation
Usw 2:5 Naming
Usw 2:6 Commercial and popular uptake
Usw 2:9 Current development
Des 1:1 User interface
Des 1:2 Video input infrastructure
Des 1:3 Hardware
Des 2:1 Community
Des 2:2 Programming on Linux
所需的xml文件
<?xml version="1.0" encoding="utf-8"?>
<XMLRT xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="SomeSchema.xsd" bename="The name" status="v" version="1.4" revision="1" type="x-rt">
<INTRO>
<title>Some title</title>
<creator>
</creator>
<subject>Some subject</subject>
<description>Some description</description>
<date>2010-05-12</date>
<type>Some text</type>
</INTRO>
<RTBLOCK bname="Usw" bnumber="1" bsname="1U">
<CTR cnumber="1">
<ES vnumber="1">Desktop</ES>
<ES vnumber="2">Netbooks</ES>
<ES vnumber="3">SerES, mainframes and supercomputers</ES>
<ES vnumber="4">Smart devices</ES>
<ES vnumber="5">Embedded devices</ES>
<ES vnumber="6">Gaming</ES>
<ES vnumber="7">Specialized uses</ES>
</CTR>
<CTR cnumber="2">
<ES vnumber="1">Precursors</ES>
<ES vnumber="2">Creation</ES>
<ES vnumber="5">Naming</ES>
<ES vnumber="6">Commercial and popular uptake</ES>
<ES vnumber="9">Current development</ES>
</CTR>
</RTBLOCK>
<RTBLOCK bname="Des" bnumber="1" bsname="1D">
<CTR cnumber="1">
<ES vnumber="1">User interface</ES>
<ES vnumber="2">Video input infrastructure</ES>
<ES vnumber="3">Hardware</ES>
</CTR>
<CTR cnumber="2">
<ES vnumber="1">Community</ES>
<ES vnumber="2">Programming on Linux</ES>
</CTR>
</RTBLOCK>
</XMLRT>
推荐答案
仅显示您不需要XML感知工具即可生成用于任何给定目的的所需的特定XML ,这是针对您的示例的一种方法:
Just to show you don't need an XML-aware tool to generate the specific XML you need for any given purpose, here's one way to do it for your example:
$ cat tst.awk
BEGIN {
print "<?xml version=\"1.0\" encoding=\"utf-8\"?>"
print ""
print "<XMLRT xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:noNamespaceSchemaLocation=\"SomeSchema.xsd\" bename=\"The name\" status=\"v\" version=\"1.4\" revision=\"1\" type=\"x-rt\">"
print "<INTRO>"
print " <title>Some title</title>"
print " <creator>"
print " </creator>"
print " <subject>Some subject</subject>"
print " <description>Some description</description>"
print " <date>2010-05-12</date>"
print " <type>Some text</type>"
print "</INTRO>"
rtBeg = "<RTBLOCK bname=\"%s\" bnumber=\"1\" bsname=\"1%s\">\n"
ctrBeg = " <CTR cnumber=\"%d\">\n"
esBody = " <ES vnumber=\"%d\">%s</ES>\n"
ctrEnd = " </CTR>\n"
rtEnd = "</RTBLOCK>\n"
xmlEnd = "</XMLRT>\n"
}
{
bname = $1
split($2,tmp,/:/)
cnum = tmp[1]
vnum = tmp[2]
text = $0
sub(/([^[:space:]]+[[:space:]]+){2}/,"",text)
}
bname != prevBname {
if (prevCnum != "") printf ctrEnd
if (prevBname != "") printf rtEnd
printf rtBeg, bname, substr(bname,1,1)
prevCnum = ""
prevBname = bname
}
cnum != prevCnum {
if (prevCnum != "") printf ctrEnd
printf ctrBeg, cnum
prevCnum = cnum
}
{ printf esBody, vnum, text }
END {
if (prevCnum != "") printf ctrEnd
if (prevBname != "") printf rtEnd
printf xmlEnd
}
.
$ awk -f tst.awk file
<?xml version="1.0" encoding="utf-8"?>
<XMLRT xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="SomeSchema.xsd" bename="The name" status="v" version="1.4" revision="1" type="x-rt">
<INTRO>
<title>Some title</title>
<creator>
</creator>
<subject>Some subject</subject>
<description>Some description</description>
<date>2010-05-12</date>
<type>Some text</type>
</INTRO>
<RTBLOCK bname="Usw" bnumber="1" bsname="1U">
<CTR cnumber="1">
<ES vnumber="1">Desktop</ES>
<ES vnumber="2">Netbooks</ES>
<ES vnumber="3">Servers, mainframes and supercomputers</ES>
<ES vnumber="4">Smart devices</ES>
<ES vnumber="5">Embedded devices</ES>
<ES vnumber="6">Gaming</ES>
<ES vnumber="7">Specialized uses</ES>
</CTR>
<CTR cnumber="2">
<ES vnumber="1">Precursors</ES>
<ES vnumber="2">Creation</ES>
<ES vnumber="5">Naming</ES>
<ES vnumber="6">Commercial and popular uptake</ES>
<ES vnumber="9">Current development</ES>
</CTR>
</RTBLOCK>
<RTBLOCK bname="Des" bnumber="1" bsname="1D">
<CTR cnumber="1">
<ES vnumber="1">User interface</ES>
<ES vnumber="2">Video input infrastructure</ES>
<ES vnumber="3">Hardware</ES>
</CTR>
<CTR cnumber="2">
<ES vnumber="1">Community</ES>
<ES vnumber="2">Programming on Linux</ES>
</CTR>
</RTBLOCK>
</XMLRT>
以上内容可在任何UNIX机器上的任何外壳中的任何POSIX awk上高效,强大且可移植地工作.
The above will work efficiently, robustly and portably with any POSIX awk in any shell on any UNIX box.
这篇关于从TXT文件生成XML格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文