猛砸，删除空的XML标记 [英] Bash, Remove empty XML tags

查看：87 发布时间：2016/8/3 11:46:29 xml linux bash sed

本文介绍了猛砸，删除空的XML标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要一些帮助一对夫妇的问题，使用bash工具

我想从一个文件，例如删除空的XML标记：

＆LT; CreateOffice code＆GT;
      ＆LT; OperatorId＆GT;＆已经LT; / OperatorId＆GT;
      ＆LT;办公室code＆GT; 1234 LT; /办公code＆GT;
      ＆LT;国家codeLength＆GT; 0℃; /国家codeLength＆GT;
      ＆LT;面积codeLength＆GT;第3版; /地区codeLength＆GT;
      ＆LT;属性＆GT;＆LT; /属性＆GT;
      ＆LT; ChargeArea＆GT;＆LT; / ChargeArea＆GT;
 ＆LT; / CreateOffice code＆GT;

成为：

＆LT; CreateOffice code＆GT;
      ＆LT; OperatorId＆GT;＆已经LT; / OperatorId＆GT;
      ＆LT;办公室code＆GT; 1234 LT; /办公code＆GT;
      ＆LT;国家codeLength＆GT; 0℃; /国家codeLength＆GT;
      ＆LT;面积codeLength＆GT;第3版; /地区codeLength＆GT;
 ＆LT; / CreateOffice code＆GT;

为了这个，我已经通过这个命令这样做。

  SED -i/＆GT;＆LT; \\ // D'文件

这是不那么严格，它更像是一个把戏，更合适些会找到＆lt;模式＆GT;＆LT; /图案＆GT; 键，将其删除。建议？

<醇开始=2>

二，如何从去：

＆LT; CreateOfficeGroup＆GT;
       ＆LT; CreateOfficeName＆GT;约翰和LT; / CreateOfficeName＆GT;
       ＆LT; CreateOffice code＆GT;
       ＆LT; / CreateOffice code＆GT;
 ＆LT; / CreateOfficeGroup＆GT;

到

＆LT; CreateOfficeGroup＆GT;
       ＆LT; CreateOfficeName＆GT;约翰和LT; / CreateOfficeName＆GT;
 ＆LT; / CreateOfficeGroup＆GT;

<醇开始=3>

作为一个整体的事情吗？来自：

＆LT; CreateOfficeGroup＆GT;
       ＆LT; CreateOfficeName＆GT;约翰和LT; / CreateOfficeName＆GT;
       ＆LT; CreateOffice code＆GT;
            ＆LT; OperatorId＆GT;＆已经LT; / OperatorId＆GT;
            ＆LT;办公室code＆GT; 1234 LT; /办公code＆GT;
            ＆LT;国家codeLength＆GT; 0℃; /国家codeLength＆GT;
            ＆LT;面积codeLength＆GT;第3版; /地区codeLength＆GT;
            ＆LT;属性＆GT;＆LT; /属性＆GT;
            ＆LT; ChargeArea＆GT;＆LT; / ChargeArea＆GT;
       ＆LT; / CreateOffice code＆GT;
       ＆LT; CreateOfficeSize＆GT;
            ＆LT;椅子＆GT;＆LT; /椅＆GT;
            ＆LT;桌子GT;＆LT; /桌＆GT;
       ＆LT; / CreateOfficeSize＆GT;
 ＆LT; / CreateOfficeGroup＆GT;

到

＆LT; CreateOfficeGroup＆GT;
       ＆LT; CreateOfficeName＆GT;约翰和LT; / CreateOfficeName＆GT;
       ＆LT; CreateOffice code＆GT;
            ＆LT; OperatorId＆GT;＆已经LT; / OperatorId＆GT;
            ＆LT;办公室code＆GT; 1234 LT; /办公code＆GT;
            ＆LT;国家codeLength＆GT; 0℃; /国家codeLength＆GT;
            ＆LT;面积codeLength＆GT;第3版; /地区codeLength＆GT;
       ＆LT; / CreateOffice code＆GT;
 ＆LT; / CreateOfficeGroup＆GT;

您可以回答这些问题作为个人？非常感谢你！

解决方案

XMLStarlet 是一个命令行XML处理器。做你想用它的是一个单行的操作（直到所需递归行为被加入），并且将用于描述相同的输入XML语法的所有变体工作：

简单的版本：

  xmlstarlet版\\
  -d//*[not(./*）和（非（./文（））或正常化空间（./文（））=）]'\\
  input.xml中

花哨的版本：

  strip_recursively（）{
  本地文档last_doc
  IFS =读-r -d'DOC
  而：;做
    last_doc = $ doc的
    DOC = $（xmlstarlet编辑\\
           -d//*[not(./*）和（非（./文（））或正常化空间（./文（））=）]'\\
           为/ dev /标准输入＆LT;＆LT;＆LT;$ last_doc）
    如果[[$ doc的=$ last_doc]];然后
      printf的'％s的\\ n'$ doc的
      返回
    科幻
  DONE
}
strip_recursively＆LT;的input.xml

的/ dev /标准输入而不使用 - （在一定的成本，以平台的可移植性）为更好的便携性跨越XMLStarlet的排放;调整的味道。

在仅安装较旧的依赖关系有一个系统，已经安装了一个更可能的XML解析器是捆绑使用Python。

 ＃！的/ usr /斌/包膜蟒蛇进口位置为xml.etree.ElementTree作为etree
进口SYSDOC =调用etree.parse（sys.stdin）
高清西梅（父）：
    ever_changed =假
    而真正的：
        改变=假
        对于EL在parent.getchildren（）：
            如果len（el.getchildren（））== 0：
                如果（（el.text是无或el.text.strip（）==''）和
                    （el.tail是无或el.tail.strip（）==''））：
                    parent.remove（EL）
                    改变=真
            其他：
                改变=改变或修剪（EL）
        ever_changed =改变或ever_changed
        如果换成是假：
            返回ever_changed修剪（doc.getroot（））
打印etree.tostring（doc.getroot（））

I need some help a couple of questions, using bash tools

I want to remove empty xml tags from a file eg:

 <CreateOfficeCode>
      <OperatorId>ve</OperatorId>
      <OfficeCode>1234</OfficeCode>
      <CountryCodeLength>0</CountryCodeLength>
      <AreaCodeLength>3</AreaCodeLength>
      <Attributes></Attributes>
      <ChargeArea></ChargeArea>
 </CreateOfficeCode>

to become:

 <CreateOfficeCode>
      <OperatorId>ve</OperatorId>
      <OfficeCode>1234</OfficeCode>
      <CountryCodeLength>0</CountryCodeLength>
      <AreaCodeLength>3</AreaCodeLength>
 </CreateOfficeCode>

for this I have done so by this command

sed -i '/><\//d' file

which is not so strict, its more like a trick, something more appropriate would be to find the <pattern></pattern> and remove it. Suggestion?

Second, how to go from:

 <CreateOfficeGroup>
       <CreateOfficeName>John</CreateOfficeName>
       <CreateOfficeCode>
       </CreateOfficeCode>
 </CreateOfficeGroup>

to:

 <CreateOfficeGroup>
       <CreateOfficeName>John</CreateOfficeName>
 </CreateOfficeGroup>

As a whole thing? from:

 <CreateOfficeGroup>
       <CreateOfficeName>John</CreateOfficeName>
       <CreateOfficeCode>
            <OperatorId>ve</OperatorId>
            <OfficeCode>1234</OfficeCode>
            <CountryCodeLength>0</CountryCodeLength>
            <AreaCodeLength>3</AreaCodeLength>
            <Attributes></Attributes>
            <ChargeArea></ChargeArea>
       </CreateOfficeCode>
       <CreateOfficeSize>
            <Chairs></Chairs>
            <Tables></Tables>
       </CreateOfficeSize>
 </CreateOfficeGroup>

to:

 <CreateOfficeGroup>
       <CreateOfficeName>John</CreateOfficeName>
       <CreateOfficeCode>
            <OperatorId>ve</OperatorId>
            <OfficeCode>1234</OfficeCode>
            <CountryCodeLength>0</CountryCodeLength>
            <AreaCodeLength>3</AreaCodeLength>
       </CreateOfficeCode>
 </CreateOfficeGroup>

Can you answer the questions as individuals? Thank you very much!

解决方案

XMLStarlet is a command-line XML processor. Doing what you want with it is a one-line operation (until the desired recursive behavior is added), and will work for all variants of XML syntax describing the same input:

The simple version:

xmlstarlet ed \
  -d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
  input.xml

The fancy version:

strip_recursively() {
  local doc last_doc
  IFS= read -r -d '' doc 
  while :; do
    last_doc=$doc
    doc=$(xmlstarlet ed \
           -d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
           /dev/stdin <<<"$last_doc")
    if [[ $doc = "$last_doc" ]]; then
      printf '%s\n' "$doc"
      return
    fi
  done
}
strip_recursively <input.xml

/dev/stdin is used rather than - (at some cost to platform portability) for better portability across releases of XMLStarlet; adjust to taste.

With a system having only older dependencies installed, a more likely XML parser to have installed is that bundled with Python.

#!/usr/bin/env python

import xml.etree.ElementTree as etree
import sys

doc = etree.parse(sys.stdin)
def prune(parent):
    ever_changed = False
    while True:
        changed = False
        for el in parent.getchildren():
            if len(el.getchildren()) == 0:
                if ((el.text is None or el.text.strip() == '') and
                    (el.tail is None or el.tail.strip() == '')):
                    parent.remove(el)
                    changed = True
            else:
                changed = changed or prune(el)
        ever_changed = changed or ever_changed
        if changed is False:
            return ever_changed

prune(doc.getroot())
print etree.tostring(doc.getroot())

这篇关于猛砸，删除空的XML标记的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

猛砸，删除空的XML标记 [英] Bash, Remove empty XML tags

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

猛砸，删除空的XML标记 [英] Bash, Remove empty XML tags

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭