使用ksh脚本从XML提取数据 [英] Extract data from XML using ksh script

查看:93
本文介绍了使用ksh脚本从XML提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于缺乏信息,我就这个问题问的第一个问题已关闭.因此,再次询问此问题并添加了更多详细信息.

我必须从xml文件中提取一个标签中给定的值,并且必须使用ksh(我可以在perl中解决此问题,但必须使用ksh,不能使用像xmlsh这样的第三方工具)

sample.xml

<?xml version="1.0" standalone="yes" ?>
<parent_one>
  <parent_two>
    <Pool>
      <pool_name>ABC</pool_name>
      <percent_full>79</percent_full>
      <pool_state>Enabled</pool_state>
    </Pool>
    <Pool>
      <pool_name>DEF</pool_name>
      <percent_full>40</percent_full>
      <pool_state>Enabled</pool_state>
    </Pool>
    <Pool>
      <pool_name>XYZ</pool_name>
      <percent_full>40</percent_full>
      <pool_state>Disabled</pool_state>
    </Pool> 
    <Totals>
      <total_tracks>4546456</total_tracks>
      <percent_full>48</percent_full>
    </Totals>
  </parent_two>
</parent_one>

由于启用了相应的pool_state标记,因此ksh脚本应读取sample.xml并从pool_name标记中打印ABC,DEF.它不应打印XYZ,因为其pool_state标记已被禁用.

ksh脚本将读取sample.xml并输出以下内容

ABC

DEF

这在ksh中可行吗?还是我必须为此使用perl?

解决方案

我已经用(n)awk完成了很多奇数格式文件的解析.从技术上讲,这可以只用ksh来完成,但是awk(和perl)更容易...

以下示例使用了awk中的 start end 构造,该构造仅处理 start end 模式. (在这种情况下,为<Pool></Pool>.)

除此之外,它很简单,使用变量来模仿xml元素以使内容更加清晰.

awk '/<Pool>/,/<\/Pool>/ {
    if (/<pool_state>/) {
        pool_state=(/<pool_state>Enabled<\/pool_state>/)
    }
    if (/<pool_name>/) {
        if ( gsub(/.*<pool_name>|<\/pool_name>.*/,"") ) {
          pool_name=$0
        }
    }
    if (/<\/Pool>/) {
      if (pool_name && pool_state)
        print pool_name
      unset pool_name
      unset pool_state
    }
}' sample.xml

当xml格式错误,一行中列出多个Pool元素等时,此代码将严重失败.

The first question I asked on this topic was closed because of lack of info. So asking this again with some more details added.

I have to extract a value given in one tag from a xml file and I have to do it using ksh (I can solve this in perl but I have to do it ksh, cannot use third party tools like xmlsh)

sample.xml

<?xml version="1.0" standalone="yes" ?>
<parent_one>
  <parent_two>
    <Pool>
      <pool_name>ABC</pool_name>
      <percent_full>79</percent_full>
      <pool_state>Enabled</pool_state>
    </Pool>
    <Pool>
      <pool_name>DEF</pool_name>
      <percent_full>40</percent_full>
      <pool_state>Enabled</pool_state>
    </Pool>
    <Pool>
      <pool_name>XYZ</pool_name>
      <percent_full>40</percent_full>
      <pool_state>Disabled</pool_state>
    </Pool> 
    <Totals>
      <total_tracks>4546456</total_tracks>
      <percent_full>48</percent_full>
    </Totals>
  </parent_two>
</parent_one>

The ksh script should read sample.xml and print ABC, DEF from pool_name tag because the corresponding pool_state tag is enabled. It should not print XYZ because its pool_state tag is disabled.

The ksh script would read sample.xml and output the following

ABC

DEF

Is this feasible in ksh or do I have to use perl for this?

解决方案

I've done quite a lot of parsing of odd format files with (n)awk. Technically, this could be done with just ksh, but awk (and perl) are easier...

The following sample makes use of the start, end construct in awk that will only process the lines between the start and end patterns. (In this case <Pool> and </Pool>.)

Other than that it's straightforward, using variables mimicking the xml elements for clarity.

awk '/<Pool>/,/<\/Pool>/ {
    if (/<pool_state>/) {
        pool_state=(/<pool_state>Enabled<\/pool_state>/)
    }
    if (/<pool_name>/) {
        if ( gsub(/.*<pool_name>|<\/pool_name>.*/,"") ) {
          pool_name=$0
        }
    }
    if (/<\/Pool>/) {
      if (pool_name && pool_state)
        print pool_name
      unset pool_name
      unset pool_state
    }
}' sample.xml

This code will fail horribly when the xml is malformed, when multiple Pool elements are listed on a single line, etc.

这篇关于使用ksh脚本从XML提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆