python - 如何将空树节点作为空字符串写入xml文件 [英] python - how to write empty tree node as empty string to xml file

查看:43
本文介绍了python - 如何将空树节点作为空字符串写入xml文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想删除某个标签值的元素,然后写出 .xml 文件,而这些删除的元素没有任何标签;是我创建新树的唯一选择吗?

I want to remove elements of a certain tag value and then write out the .xml file WITHOUT any tags for those deleted elements; is my only option to create a new tree?

移除/删除元素有两个选项:

There are two options to remove/delete an element:

clear()重置一个元素.此函数删除所有子元素,清除所有属性,并将 text 和 tail 属性设置为 None.

clear() Resets an element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None.

起初我使用它,它的目的是从元素中删除数据,但我仍然留下一个空元素:

At first I used this and it works for the purpose of removing the data from the element but I'm still left with an empty element:

# Remove all elements from the tree that are NOT "job" or "make" or "build" elements
log = open("debug.log", "w")
for el in root.iter(*):

    if el.tag != "job" and el.tag != "make" and el.tag != "build":
        print("removed = ", el.tag, el.attrib, file=log)
        el.clear()
    else:
        print("NOT", el.tag, el.attrib, file=log)

log.close()
tree.write("make_and_job_tree.xml", short_empty_elements=False)

问题在于 xml.etree.ElementTree.ElementTree.write() 无论如何仍然写出空标签:

...仅关键字的 short_empty_elements 参数控制不包含内容的元素的格式.如果为 True(默认),它们作为单个自闭合标签发出,否则它们是作为一对开始/结束标签发出.

...The keyword-only short_empty_elements parameter controls the formatting of elements that contain no content. If True (the default), they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags.

为什么没有不打印那些空标签的选项!随便.

Why isn't there an option to just not print out those empty tags! Whatever.

所以我想我可以试试

删除(子元素)从元素中移除子元素.与 find* 方法不同的是方法根据实例标识而不是标签比较元素价值或内容.

remove(subelement) Removes subelement from the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.

但这仅对子元素进行操作.

But this only operates on the child elements.

所以我必须做类似的事情:

for el in root.iter(*):
    for subel in el:
        if subel.tag != "make" and subel.tag != "job" and subel.tag != "build":
            el.remove(subel)

但是这里有一个大问题:我通过删除元素使迭代器无效,对吗?

But there's a big problem here: I'm invalidating the iterator by removing elements, right?

通过添加if subel来简单地检查元素是否为空是否足够?:

Is it enough to simply check if the element is empty by adding if subel?:

if subel and subel.tag != "make" and subel.tag != "job" and subel.tag != "build"

还是每次我使树元素失效时都必须获得一个新的迭代器?

Or do I have to get a new iterator to the tree elements every time I invalidate it?

记住:我只是想写出没有空元素标签的 xml 文件.

Remember: I just wanted to write out the xml file with no tags for the empty elements.

这是一个例子.

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

假设我想删除任何对 neighbor 的提及.理想情况下,我希望在删除后获得此输出:

Let's say I want to remove any mention of neighbor. Ideally, I'd want this output after the removal:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
    </country>
</data>

问题是,当我使用 clear() 运行代码(请参阅上面的第一个代码块)并将其写入文件时,我得到了:

Problem, is when I run the code using clear() (see first code block up above) and write it to a file, I get this:

<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor></neighbor><neighbor></neighbor></country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor></neighbor></country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor></neighbor><neighbor></neighbor></country>
</data>

注意 neighbor 仍然出现.

我知道我可以轻松地对输出运行正则表达式,但必须有一种方法(或其他 Python api)可以即时执行此操作,而不需要我再次触摸我的 .xml 文件.

I know I could easily run a regex over the output but there's gotta be a way (or another Python api) that does this on the fly instead of requiring me to touch my .xml file again.

推荐答案

import lxml.etree as et

xml  = et.parse("test.xml")

for node in xml.xpath("//neighbor"):
    node.getparent().remove(node)


xml.write("out.xml",encoding="utf-8",xml_declaration=True)

使用elementTree,我们需要找到邻居节点的父节点,然后找到该父节点内的邻居节点 并删除它们:

Using elementTree, we need to find the parents of the neighbor nodes then find the neighbor nodes inside that parent and remove them:

from xml.etree import ElementTree as et

xml  = et.parse("test.xml")


for parent in xml.getroot().findall(".//neighbor/.."):
      for child in parent.findall("./neighbor"):
          parent.remove(child)


xml.write("out.xml",encoding="utf-8",xml_declaration=True)

两者都会给你:

<?xml version='1.0' encoding='utf-8'?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        </country>
</data>

使用您的属性逻辑并修改xml,如下所示:

Using your attribute logic and modifying the xml a bit like below:

x = """<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
           <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>"""

使用 lxml:

import lxml.etree as et

xml = et.fromstring(x)

for node in xml.xpath("//neighbor[not(@make) and not(@job) and not(@make)]"):
    node.getparent().remove(node)
print(et.tostring(xml))

会给你:

 <data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W" make="foo" build="bar" job="blah"/>
        </country>
</data>

ElementTree 中的相同逻辑:

The same logic in ElementTree:

from xml.etree import ElementTree as et

xml = et.parse("test.xml").getroot()

atts = {"build", "job", "make"}

for parent in xml.findall(".//neighbor/.."):
    for child in parent.findall(".//neighbor")[:]:
        if not atts.issubset(child.attrib):
            parent.remove(child)

如果您使用的是迭代器:

If you are using iter:

from xml.etree import ElementTree as et

xml = et.parse("test.xml")

for parent in xml.getroot().iter("*"):
    parent[:] = (child for child in parent if child.tag != "neighbor")

你可以看到我们得到了完全相同的输出:

You can see we get the exact same output:

In [30]: !cat /home/padraic/untitled6/test.xml
<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">#
      <neighbor name="Austria" direction="E"/>
        <rank>1</rank>
        <neighbor name="Austria" direction="E"/>
        <year>2008</year>
      <neighbor name="Austria" direction="E"/>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>
In [31]: paste
def test():
    import lxml.etree as et
    xml = et.parse("/home/padraic/untitled6/test.xml")
    for node in xml.xpath("//neighbor"):
        node.getparent().remove(node)
    a = et.tostring(xml)
    from xml.etree import ElementTree as et
    xml = et.parse("/home/padraic/untitled6/test.xml")
    for parent in xml.getroot().iter("*"):
        parent[:] = (child for child in parent if child.tag != "neighbor")
    b = et.tostring(xml.getroot())
    assert  a == b

## -- End pasted text --

In [32]: test()

这篇关于python - 如何将空树节点作为空字符串写入xml文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆