使用ksh脚本从XML提取数据 [英] Extract data from XML using ksh script
问题描述
由于缺乏信息,我就这个问题问的第一个问题已关闭.因此,再次询问此问题并添加了更多详细信息.
我必须从xml文件中提取一个标签中给定的值,并且必须使用ksh(我可以在perl中解决此问题,但必须使用ksh,不能使用像xmlsh这样的第三方工具)>
sample.xml
<?xml version="1.0" standalone="yes" ?>
<parent_one>
<parent_two>
<Pool>
<pool_name>ABC</pool_name>
<percent_full>79</percent_full>
<pool_state>Enabled</pool_state>
</Pool>
<Pool>
<pool_name>DEF</pool_name>
<percent_full>40</percent_full>
<pool_state>Enabled</pool_state>
</Pool>
<Pool>
<pool_name>XYZ</pool_name>
<percent_full>40</percent_full>
<pool_state>Disabled</pool_state>
</Pool>
<Totals>
<total_tracks>4546456</total_tracks>
<percent_full>48</percent_full>
</Totals>
</parent_two>
</parent_one>
由于启用了相应的pool_state标记,因此ksh脚本应读取sample.xml并从pool_name标记中打印ABC,DEF.它不应打印XYZ,因为其pool_state标记已被禁用.
ksh脚本将读取sample.xml并输出以下内容
ABC
DEF
这在ksh中可行吗?还是我必须为此使用perl?
我已经用(n)awk完成了很多奇数格式文件的解析.从技术上讲,这可以只用ksh来完成,但是awk(和perl)更容易...
以下示例使用了awk
中的 start , end 构造,该构造仅处理 start 和 end 模式. (在这种情况下,为<Pool>
和</Pool>
.)
除此之外,它很简单,使用变量来模仿xml元素以使内容更加清晰.
awk '/<Pool>/,/<\/Pool>/ {
if (/<pool_state>/) {
pool_state=(/<pool_state>Enabled<\/pool_state>/)
}
if (/<pool_name>/) {
if ( gsub(/.*<pool_name>|<\/pool_name>.*/,"") ) {
pool_name=$0
}
}
if (/<\/Pool>/) {
if (pool_name && pool_state)
print pool_name
unset pool_name
unset pool_state
}
}' sample.xml
当xml格式错误,一行中列出多个Pool元素等时,此代码将严重失败.
The first question I asked on this topic was closed because of lack of info. So asking this again with some more details added.
I have to extract a value given in one tag from a xml file and I have to do it using ksh (I can solve this in perl but I have to do it ksh, cannot use third party tools like xmlsh)
sample.xml
<?xml version="1.0" standalone="yes" ?>
<parent_one>
<parent_two>
<Pool>
<pool_name>ABC</pool_name>
<percent_full>79</percent_full>
<pool_state>Enabled</pool_state>
</Pool>
<Pool>
<pool_name>DEF</pool_name>
<percent_full>40</percent_full>
<pool_state>Enabled</pool_state>
</Pool>
<Pool>
<pool_name>XYZ</pool_name>
<percent_full>40</percent_full>
<pool_state>Disabled</pool_state>
</Pool>
<Totals>
<total_tracks>4546456</total_tracks>
<percent_full>48</percent_full>
</Totals>
</parent_two>
</parent_one>
The ksh script should read sample.xml and print ABC, DEF from pool_name tag because the corresponding pool_state tag is enabled. It should not print XYZ because its pool_state tag is disabled.
The ksh script would read sample.xml and output the following
ABC
DEF
Is this feasible in ksh or do I have to use perl for this?
I've done quite a lot of parsing of odd format files with (n)awk. Technically, this could be done with just ksh, but awk (and perl) are easier...
The following sample makes use of the start, end construct in awk
that will only process the lines between the start and end patterns. (In this case <Pool>
and </Pool>
.)
Other than that it's straightforward, using variables mimicking the xml elements for clarity.
awk '/<Pool>/,/<\/Pool>/ {
if (/<pool_state>/) {
pool_state=(/<pool_state>Enabled<\/pool_state>/)
}
if (/<pool_name>/) {
if ( gsub(/.*<pool_name>|<\/pool_name>.*/,"") ) {
pool_name=$0
}
}
if (/<\/Pool>/) {
if (pool_name && pool_state)
print pool_name
unset pool_name
unset pool_state
}
}' sample.xml
This code will fail horribly when the xml is malformed, when multiple Pool elements are listed on a single line, etc.
这篇关于使用ksh脚本从XML提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!