猛砸,删除空的XML标记 [英] Bash, Remove empty XML tags

查看:87
本文介绍了猛砸,删除空的XML标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一些帮助一对夫妇的问题,使用bash工具


  1. 我想从一个文件,例如删除空的XML标记:

< CreateOffice code>
      < OperatorId>&已经LT; / OperatorId>
      <办公室code> 1234 LT; /办公code>
      <国家codeLength> 0℃; /国家codeLength>
      <面积codeLength>第3版; /地区codeLength>
      <属性>< /属性>
      < ChargeArea>< / ChargeArea>
 < / CreateOffice code>

成为:

< CreateOffice code>
      < OperatorId>&已经LT; / OperatorId>
      <办公室code> 1234 LT; /办公code>
      <国家codeLength> 0℃; /国家codeLength>
      <面积codeLength>第3版; /地区codeLength>
 < / CreateOffice code>

为了这个,我已经通过这个命令这样做。

  SED -i/>< \\ // D'文件

这是不那么严格,它更像是一个把戏,更合适些会找到<模式>< /图案> 键,将其删除。建议?

<醇开始=2>
  • 二,如何从去:

  • &LT; CreateOfficeGroup&GT;
           &LT; CreateOfficeName&GT;约翰和LT; / CreateOfficeName&GT;
           &LT; CreateOffice code&GT;
           &LT; / CreateOffice code&GT;
     &LT; / CreateOfficeGroup&GT;

    &LT; CreateOfficeGroup&GT;
           &LT; CreateOfficeName&GT;约翰和LT; / CreateOfficeName&GT;
     &LT; / CreateOfficeGroup&GT;

    <醇开始=3>

  • 作为一个整体的事情吗?来自:

  • &LT; CreateOfficeGroup&GT;
           &LT; CreateOfficeName&GT;约翰和LT; / CreateOfficeName&GT;
           &LT; CreateOffice code&GT;
                &LT; OperatorId&GT;&已经LT; / OperatorId&GT;
                &LT;办公室code&GT; 1234 LT; /办公code&GT;
                &LT;国家codeLength&GT; 0℃; /国家codeLength&GT;
                &LT;面积codeLength&GT;第3版; /地区codeLength&GT;
                &LT;属性&GT;&LT; /属性&GT;
                &LT; ChargeArea&GT;&LT; / ChargeArea&GT;
           &LT; / CreateOffice code&GT;
           &LT; CreateOfficeSize&GT;
                &LT;椅子&GT;&LT; /椅&GT;
                &LT;桌子GT;&LT; /桌&GT;
           &LT; / CreateOfficeSize&GT;
     &LT; / CreateOfficeGroup&GT;

    &LT; CreateOfficeGroup&GT;
           &LT; CreateOfficeName&GT;约翰和LT; / CreateOfficeName&GT;
           &LT; CreateOffice code&GT;
                &LT; OperatorId&GT;&已经LT; / OperatorId&GT;
                &LT;办公室code&GT; 1234 LT; /办公code&GT;
                &LT;国家codeLength&GT; 0℃; /国家codeLength&GT;
                &LT;面积codeLength&GT;第3版; /地区codeLength&GT;
           &LT; / CreateOffice code&GT;
     &LT; / CreateOfficeGroup&GT;

    您可以回答这些问题作为个人?非常感谢你!


    解决方案

    XMLStarlet 是一个命令行XML处理器。做你想用它的是一个单行的操作(直到所需递归行为被加入),并且将用于描述相同的输入XML语法的所有变体工作:

    简单的版本:

      xmlstarlet版\\
      -d//*[not(./*)和(非(./文())或正常化空间(./文())=)]'\\
      input.xml中

    花哨的版本:

      strip_recursively(){
      本地文档last_doc
      IFS =读-r -d'DOC
      而:;做
        last_doc = $ doc的
        DOC = $(xmlstarlet编辑\\
               -d//*[not(./*)和(非(./文())或正常化空间(./文())=)]'\\
               为/ dev /标准输入&LT;&LT;&LT;$ last_doc)
        如果[[$ doc的=$ last_doc]];然后
          printf的'%s的\\ n'$ doc的
          返回
        科幻
      DONE
    }
    strip_recursively&LT;的input.xml

    的/ dev /标准输入而不使用 - (在一定的成本,以平台的可移植性)为更好的便携性跨越XMLStarlet的排放;调整的味道。


    在仅安装较旧的依赖关系有一个系统,已经安装了一个更可能的XML解析器是捆绑使用Python。

     #!的/ usr /斌/包膜蟒蛇进口位置为xml.etree.ElementTree作为etree
    进口SYSDOC =调用etree.parse(sys.stdin)
    高清西梅(父):
        ever_changed =假
        而真正的:
            改变=假
            对于EL在parent.getchildren():
                如果len(el.getchildren())== 0:
                    如果((el.text是无或el.text.strip()=='')和
                        (el.tail是无或el.tail.strip()=='')):
                        parent.remove(EL)
                        改变=真
                其他:
                    改变=改变或修剪(EL)
            ever_changed =改变或ever_changed
            如果换成是假:
                返回ever_changed修剪(doc.getroot())
    打印etree.tostring(doc.getroot())

    I need some help a couple of questions, using bash tools

    1. I want to remove empty xml tags from a file eg:

     <CreateOfficeCode>
          <OperatorId>ve</OperatorId>
          <OfficeCode>1234</OfficeCode>
          <CountryCodeLength>0</CountryCodeLength>
          <AreaCodeLength>3</AreaCodeLength>
          <Attributes></Attributes>
          <ChargeArea></ChargeArea>
     </CreateOfficeCode>
    

    to become:

     <CreateOfficeCode>
          <OperatorId>ve</OperatorId>
          <OfficeCode>1234</OfficeCode>
          <CountryCodeLength>0</CountryCodeLength>
          <AreaCodeLength>3</AreaCodeLength>
     </CreateOfficeCode>
    

    for this I have done so by this command

    sed -i '/><\//d' file
    

    which is not so strict, its more like a trick, something more appropriate would be to find the <pattern></pattern> and remove it. Suggestion?

    1. Second, how to go from:

     <CreateOfficeGroup>
           <CreateOfficeName>John</CreateOfficeName>
           <CreateOfficeCode>
           </CreateOfficeCode>
     </CreateOfficeGroup>
    

    to:

     <CreateOfficeGroup>
           <CreateOfficeName>John</CreateOfficeName>
     </CreateOfficeGroup>
    

    1. As a whole thing? from:

     <CreateOfficeGroup>
           <CreateOfficeName>John</CreateOfficeName>
           <CreateOfficeCode>
                <OperatorId>ve</OperatorId>
                <OfficeCode>1234</OfficeCode>
                <CountryCodeLength>0</CountryCodeLength>
                <AreaCodeLength>3</AreaCodeLength>
                <Attributes></Attributes>
                <ChargeArea></ChargeArea>
           </CreateOfficeCode>
           <CreateOfficeSize>
                <Chairs></Chairs>
                <Tables></Tables>
           </CreateOfficeSize>
     </CreateOfficeGroup>
    

    to:

     <CreateOfficeGroup>
           <CreateOfficeName>John</CreateOfficeName>
           <CreateOfficeCode>
                <OperatorId>ve</OperatorId>
                <OfficeCode>1234</OfficeCode>
                <CountryCodeLength>0</CountryCodeLength>
                <AreaCodeLength>3</AreaCodeLength>
           </CreateOfficeCode>
     </CreateOfficeGroup>
    

    Can you answer the questions as individuals? Thank you very much!

    解决方案

    XMLStarlet is a command-line XML processor. Doing what you want with it is a one-line operation (until the desired recursive behavior is added), and will work for all variants of XML syntax describing the same input:

    The simple version:

    xmlstarlet ed \
      -d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
      input.xml
    

    The fancy version:

    strip_recursively() {
      local doc last_doc
      IFS= read -r -d '' doc 
      while :; do
        last_doc=$doc
        doc=$(xmlstarlet ed \
               -d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
               /dev/stdin <<<"$last_doc")
        if [[ $doc = "$last_doc" ]]; then
          printf '%s\n' "$doc"
          return
        fi
      done
    }
    strip_recursively <input.xml
    

    /dev/stdin is used rather than - (at some cost to platform portability) for better portability across releases of XMLStarlet; adjust to taste.


    With a system having only older dependencies installed, a more likely XML parser to have installed is that bundled with Python.

    #!/usr/bin/env python
    
    import xml.etree.ElementTree as etree
    import sys
    
    doc = etree.parse(sys.stdin)
    def prune(parent):
        ever_changed = False
        while True:
            changed = False
            for el in parent.getchildren():
                if len(el.getchildren()) == 0:
                    if ((el.text is None or el.text.strip() == '') and
                        (el.tail is None or el.tail.strip() == '')):
                        parent.remove(el)
                        changed = True
                else:
                    changed = changed or prune(el)
            ever_changed = changed or ever_changed
            if changed is False:
                return ever_changed
    
    prune(doc.getroot())
    print etree.tostring(doc.getroot())
    

    这篇关于猛砸,删除空的XML标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆