使用grep,awk或sed等shell工具解析xml [英] Parse xml using shell tools like grep, awk or sed

查看:94
本文介绍了使用grep,awk或sed等shell工具解析xml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下xml可以根据tag的值来解析和提取tag的值.仅在类型==托管"时提取.我想使用brep工具(例如grep,sed和awk)进行提取.我以前没有条件地提取单个标签值,而没有条件.我可以使用python或我知道的任何其他编程语言轻松完成它.但是,如果在shell脚本中完成,这将是理想的选择.

I have the following xml to parse and extract the value of tag based on the value of tag. Extract only if type == 'hosted'. I would like to extract using the bash tools like grep, sed and awk. Extracting single tag value with no condition is something I have done it before, not with conditionals. I can easily get it done using python or any other programming language i know. But this is would be ideal if done in the shell script.

...
    <repositories-item>
      <name>hosted-npm</name>
      <type>hosted</type>
    </repositories-item>
    <repositories-item>
      <name>proxied-npm</name>
      <type>proxied</type>
    </repositories-item>
...

推荐答案

xmlstarlet 是命令行XML工具包可以将复杂的XSLT模板表示为简短的命令行开关序列.

xmlstarlet is a command line XML Toolkit that can express complex XSLT templates as a short sequence of command line switches.

假设我们提供了格式正确的XML文档repos.xml

Suppose we are provided with a well-formed XML document repos.xml

<repositories>
  <repositories-item>
      <name>hosted-npm</name>
      <type>hosted</type>
    </repositories-item>
    <repositories-item>
      <name>proxied-npm</name>
      <type>proxied</type>
    </repositories-item>
</repositories>

如果使用以下开关通过XMLStarlet过滤器运行它

If you run it through an XMLStarlet filter with the following switches

$ cat repos.xml | xmlstarlet sel -t -m '//repositories-item' \
                 -i 'type="hosted"' -v 'name' -n 

您将获得一行输出

hosted-npm

让我们看看XMLStarlet命令行.

Let's look at the XMLStarlet command line.

  1. 我们在sel开关指定的选择模式下运行命令
  2. 我们用-t开关指定选择模板
  3. 我们将解析器限制为使用-m swicth指定的//repositories-item模板的<repositories-item>元素
  4. 我们仅选择这些具有托管"元素的元素作为通过-i开关指定的type元素的值
  5. 我们打印出name元素的值,该元素由-v开关指定.
  6. 在输出的每一行之后,我们打印一个用-n开关指定的换行符.
  1. We run the command in the Select mode specified with the sel switch
  2. We specify the selection template with the -t switch
  3. We restrict parser to <repositories-item> elements with the //repositories-item template specified with the -m swicth
  4. We choose only these elements that have "hosted" as the value of type element specified with the -i switch
  5. We print out the value of the name element, specified with the -v switch.
  6. After each line of output we print a newline specified with the -n switch.

这是XMLStarlet生成的等效XSLT

Here is the equivalent XSLT generated by XMLStarlet

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
  <xsl:output omit-xml-declaration="yes" indent="no"/>
  <xsl:template match="/">
    <xsl:for-each select="//repositories-item">
      <xsl:choose>
        <xsl:when test="type=&quot;hosted&quot;">
          <xsl:call-template name="value-of-template">
            <xsl:with-param name="select" select="name"/>
          </xsl:call-template>
          <xsl:value-of select="'&#10;'"/>
        </xsl:when>
      </xsl:choose>
    </xsl:for-each>
  </xsl:template>
  <xsl:template name="value-of-template">
    <xsl:param name="select"/>
    <xsl:value-of select="$select"/>
    <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
      <xsl:value-of select="'&#10;'"/>
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

根据Charles Duffy的建议,值得注意的是,可以使用-C选项通过XMLStarlet生成此XSLT规范:

Per Charles Duffy suggestion it is worth noting that this XSLT specification can be generated with XMLStarlet using the -C option:

xmlstarlet sel -C -t -m '//repositories-item' \
       -i 'type="hosted"' -v 'name' -n > hosted-repos.xslt

此生成的XSLT规范可以直接与xsltproc一起使用

This generated XSLT specification can be directly used with xsltproc as

cat repos.xml | xsltproc hosted-repos.xslt - 

这篇关于使用grep,awk或sed等shell工具解析xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆