python - 如何将空树节点作为空字符串写入xml文件 [英] python - how to write empty tree node as empty string to xml file
问题描述
我想删除某个标签值的元素,然后写出 .xml
文件,而这些删除的元素没有任何标签;是我创建新树的唯一选择吗?
I want to remove elements of a certain tag value and then write out the .xml
file WITHOUT any tags for those deleted elements; is my only option to create a new tree?
移除/删除元素有两个选项:
There are two options to remove/delete an element:
clear()重置一个元素.此函数删除所有子元素,清除所有属性,并将 text 和 tail 属性设置为 None.
clear() Resets an element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None.
起初我使用它,它的目的是从元素中删除数据,但我仍然留下一个空元素:
At first I used this and it works for the purpose of removing the data from the element but I'm still left with an empty element:
# Remove all elements from the tree that are NOT "job" or "make" or "build" elements
log = open("debug.log", "w")
for el in root.iter(*):
if el.tag != "job" and el.tag != "make" and el.tag != "build":
print("removed = ", el.tag, el.attrib, file=log)
el.clear()
else:
print("NOT", el.tag, el.attrib, file=log)
log.close()
tree.write("make_and_job_tree.xml", short_empty_elements=False)
问题在于 xml.etree.ElementTree.ElementTree.write()
无论如何仍然写出空标签:
...仅关键字的 short_empty_elements 参数控制不包含内容的元素的格式.如果为 True(默认),它们作为单个自闭合标签发出,否则它们是作为一对开始/结束标签发出.
...The keyword-only short_empty_elements parameter controls the formatting of elements that contain no content. If True (the default), they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags.
为什么没有不打印那些空标签的选项!随便.
Why isn't there an option to just not print out those empty tags! Whatever.
所以我想我可以试试
删除(子元素)从元素中移除子元素.与 find* 方法不同的是方法根据实例标识而不是标签比较元素价值或内容.
remove(subelement) Removes subelement from the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.
但这仅对子元素进行操作.
But this only operates on the child elements.
所以我必须做类似的事情:
for el in root.iter(*):
for subel in el:
if subel.tag != "make" and subel.tag != "job" and subel.tag != "build":
el.remove(subel)
但是这里有一个大问题:我通过删除元素使迭代器无效,对吗?
But there's a big problem here: I'm invalidating the iterator by removing elements, right?
通过添加if subel
来简单地检查元素是否为空是否足够?:
Is it enough to simply check if the element is empty by adding if subel
?:
if subel and subel.tag != "make" and subel.tag != "job" and subel.tag != "build"
还是每次我使树元素失效时都必须获得一个新的迭代器?
Or do I have to get a new iterator to the tree elements every time I invalidate it?
记住:我只是想写出没有空元素标签的 xml 文件.
Remember: I just wanted to write out the xml file with no tags for the empty elements.
这是一个例子.
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
假设我想删除任何对 neighbor
的提及.理想情况下,我希望在删除后获得此输出:
Let's say I want to remove any mention of neighbor
.
Ideally, I'd want this output after the removal:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
</country>
</data>
问题是,当我使用 clear() 运行代码(请参阅上面的第一个代码块)并将其写入文件时,我得到了:
Problem, is when I run the code using clear() (see first code block up above) and write it to a file, I get this:
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor></neighbor><neighbor></neighbor></country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor></neighbor></country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor></neighbor><neighbor></neighbor></country>
</data>
注意 neighbor
仍然出现.
我知道我可以轻松地对输出运行正则表达式,但必须有一种方法(或其他 Python api)可以即时执行此操作,而不需要我再次触摸我的 .xml
文件.
I know I could easily run a regex over the output but there's gotta be a way (or another Python api) that does this on the fly instead of requiring me to touch my .xml
file again.
推荐答案
import lxml.etree as et
xml = et.parse("test.xml")
for node in xml.xpath("//neighbor"):
node.getparent().remove(node)
xml.write("out.xml",encoding="utf-8",xml_declaration=True)
使用elementTree,我们需要找到邻居节点的父节点
,然后找到该父节点内的邻居节点
并删除它们:
Using elementTree, we need to find the parents of the neighbor nodes
then find the neighbor nodes inside that parent
and remove them:
from xml.etree import ElementTree as et
xml = et.parse("test.xml")
for parent in xml.getroot().findall(".//neighbor/.."):
for child in parent.findall("./neighbor"):
parent.remove(child)
xml.write("out.xml",encoding="utf-8",xml_declaration=True)
两者都会给你:
<?xml version='1.0' encoding='utf-8'?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
</country>
</data>
使用您的属性逻辑并修改xml,如下所示:
Using your attribute logic and modifying the xml a bit like below:
x = """<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>"""
使用 lxml:
import lxml.etree as et
xml = et.fromstring(x)
for node in xml.xpath("//neighbor[not(@make) and not(@job) and not(@make)]"):
node.getparent().remove(node)
print(et.tostring(xml))
会给你:
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
</country>
</data>
ElementTree 中的相同逻辑:
The same logic in ElementTree:
from xml.etree import ElementTree as et
xml = et.parse("test.xml").getroot()
atts = {"build", "job", "make"}
for parent in xml.findall(".//neighbor/.."):
for child in parent.findall(".//neighbor")[:]:
if not atts.issubset(child.attrib):
parent.remove(child)
如果您使用的是迭代器:
If you are using iter:
from xml.etree import ElementTree as et
xml = et.parse("test.xml")
for parent in xml.getroot().iter("*"):
parent[:] = (child for child in parent if child.tag != "neighbor")
你可以看到我们得到了完全相同的输出:
You can see we get the exact same output:
In [30]: !cat /home/padraic/untitled6/test.xml
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">#
<neighbor name="Austria" direction="E"/>
<rank>1</rank>
<neighbor name="Austria" direction="E"/>
<year>2008</year>
<neighbor name="Austria" direction="E"/>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
In [31]: paste
def test():
import lxml.etree as et
xml = et.parse("/home/padraic/untitled6/test.xml")
for node in xml.xpath("//neighbor"):
node.getparent().remove(node)
a = et.tostring(xml)
from xml.etree import ElementTree as et
xml = et.parse("/home/padraic/untitled6/test.xml")
for parent in xml.getroot().iter("*"):
parent[:] = (child for child in parent if child.tag != "neighbor")
b = et.tostring(xml.getroot())
assert a == b
## -- End pasted text --
In [32]: test()
这篇关于python - 如何将空树节点作为空字符串写入xml文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!