猛砸,删除空的XML标记 [英] Bash, Remove empty XML tags
问题描述
我需要一些帮助一对夫妇的问题,使用bash工具
- 我想从一个文件,例如删除空的XML标记:
< CreateOffice code>
< OperatorId>&已经LT; / OperatorId>
<办公室code> 1234 LT; /办公code>
<国家codeLength> 0℃; /国家codeLength>
<面积codeLength>第3版; /地区codeLength>
<属性>< /属性>
< ChargeArea>< / ChargeArea>
< / CreateOffice code>
成为:
< CreateOffice code>
< OperatorId>&已经LT; / OperatorId>
<办公室code> 1234 LT; /办公code>
<国家codeLength> 0℃; /国家codeLength>
<面积codeLength>第3版; /地区codeLength>
< / CreateOffice code>
为了这个,我已经通过这个命令这样做。
SED -i/>< \\ // D'文件
这是不那么严格,它更像是一个把戏,更合适些会找到<模式>< /图案>
键,将其删除。建议?
< CreateOfficeGroup>
< CreateOfficeName>约翰和LT; / CreateOfficeName>
< CreateOffice code>
< / CreateOffice code>
< / CreateOfficeGroup>
到
< CreateOfficeGroup>
< CreateOfficeName>约翰和LT; / CreateOfficeName>
< / CreateOfficeGroup>
<醇开始=3>
&LT; CreateOfficeGroup&GT;
&LT; CreateOfficeName&GT;约翰和LT; / CreateOfficeName&GT;
&LT; CreateOffice code&GT;
&LT; OperatorId&GT;&已经LT; / OperatorId&GT;
&LT;办公室code&GT; 1234 LT; /办公code&GT;
&LT;国家codeLength&GT; 0℃; /国家codeLength&GT;
&LT;面积codeLength&GT;第3版; /地区codeLength&GT;
&LT;属性&GT;&LT; /属性&GT;
&LT; ChargeArea&GT;&LT; / ChargeArea&GT;
&LT; / CreateOffice code&GT;
&LT; CreateOfficeSize&GT;
&LT;椅子&GT;&LT; /椅&GT;
&LT;桌子GT;&LT; /桌&GT;
&LT; / CreateOfficeSize&GT;
&LT; / CreateOfficeGroup&GT;
到
&LT; CreateOfficeGroup&GT;
&LT; CreateOfficeName&GT;约翰和LT; / CreateOfficeName&GT;
&LT; CreateOffice code&GT;
&LT; OperatorId&GT;&已经LT; / OperatorId&GT;
&LT;办公室code&GT; 1234 LT; /办公code&GT;
&LT;国家codeLength&GT; 0℃; /国家codeLength&GT;
&LT;面积codeLength&GT;第3版; /地区codeLength&GT;
&LT; / CreateOffice code&GT;
&LT; / CreateOfficeGroup&GT;
您可以回答这些问题作为个人?非常感谢你!
XMLStarlet 是一个命令行XML处理器。做你想用它的是一个单行的操作(直到所需递归行为被加入),并且将用于描述相同的输入XML语法的所有变体工作:
简单的版本:
xmlstarlet版\\
-d//*[not(./*)和(非(./文())或正常化空间(./文())=)]'\\
input.xml中
花哨的版本:
strip_recursively(){
本地文档last_doc
IFS =读-r -d'DOC
而:;做
last_doc = $ doc的
DOC = $(xmlstarlet编辑\\
-d//*[not(./*)和(非(./文())或正常化空间(./文())=)]'\\
为/ dev /标准输入&LT;&LT;&LT;$ last_doc)
如果[[$ doc的=$ last_doc]];然后
printf的'%s的\\ n'$ doc的
返回
科幻
DONE
}
strip_recursively&LT;的input.xml
的/ dev /标准输入
而不使用 -
(在一定的成本,以平台的可移植性)为更好的便携性跨越XMLStarlet的排放;调整的味道。
在仅安装较旧的依赖关系有一个系统,已经安装了一个更可能的XML解析器是捆绑使用Python。
#!的/ usr /斌/包膜蟒蛇进口位置为xml.etree.ElementTree作为etree
进口SYSDOC =调用etree.parse(sys.stdin)
高清西梅(父):
ever_changed =假
而真正的:
改变=假
对于EL在parent.getchildren():
如果len(el.getchildren())== 0:
如果((el.text是无或el.text.strip()=='')和
(el.tail是无或el.tail.strip()=='')):
parent.remove(EL)
改变=真
其他:
改变=改变或修剪(EL)
ever_changed =改变或ever_changed
如果换成是假:
返回ever_changed修剪(doc.getroot())
打印etree.tostring(doc.getroot())
I need some help a couple of questions, using bash tools
- I want to remove empty xml tags from a file eg:
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
<Attributes></Attributes>
<ChargeArea></ChargeArea>
</CreateOfficeCode>
to become:
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
</CreateOfficeCode>
for this I have done so by this command
sed -i '/><\//d' file
which is not so strict, its more like a trick, something more appropriate would be to find the <pattern></pattern>
and remove it. Suggestion?
- Second, how to go from:
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
</CreateOfficeCode>
</CreateOfficeGroup>
to:
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
</CreateOfficeGroup>
- As a whole thing? from:
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
<Attributes></Attributes>
<ChargeArea></ChargeArea>
</CreateOfficeCode>
<CreateOfficeSize>
<Chairs></Chairs>
<Tables></Tables>
</CreateOfficeSize>
</CreateOfficeGroup>
to:
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
</CreateOfficeCode>
</CreateOfficeGroup>
Can you answer the questions as individuals? Thank you very much!
XMLStarlet is a command-line XML processor. Doing what you want with it is a one-line operation (until the desired recursive behavior is added), and will work for all variants of XML syntax describing the same input:
The simple version:
xmlstarlet ed \
-d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
input.xml
The fancy version:
strip_recursively() {
local doc last_doc
IFS= read -r -d '' doc
while :; do
last_doc=$doc
doc=$(xmlstarlet ed \
-d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
/dev/stdin <<<"$last_doc")
if [[ $doc = "$last_doc" ]]; then
printf '%s\n' "$doc"
return
fi
done
}
strip_recursively <input.xml
/dev/stdin
is used rather than -
(at some cost to platform portability) for better portability across releases of XMLStarlet; adjust to taste.
With a system having only older dependencies installed, a more likely XML parser to have installed is that bundled with Python.
#!/usr/bin/env python
import xml.etree.ElementTree as etree
import sys
doc = etree.parse(sys.stdin)
def prune(parent):
ever_changed = False
while True:
changed = False
for el in parent.getchildren():
if len(el.getchildren()) == 0:
if ((el.text is None or el.text.strip() == '') and
(el.tail is None or el.tail.strip() == '')):
parent.remove(el)
changed = True
else:
changed = changed or prune(el)
ever_changed = changed or ever_changed
if changed is False:
return ever_changed
prune(doc.getroot())
print etree.tostring(doc.getroot())
这篇关于猛砸,删除空的XML标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!