从已解析的 XML 树中删除元素会中断迭代 [英] Removing an element from a parsed XML tree disrupts iteration

查看:20
本文介绍了从已解析的 XML 树中删除元素会中断迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析一个 xml 文件,然后通过删除选定的元素来处理结果树.我的问题是删除元素会破坏对元素进行迭代的循环.

I want to parse an xml file, then process the result tree by removing selected elements. My problem is that removing an element disrupts the loop that iterates over the elements.

考虑以下 xml 数据:

Consider the following xml data:

<results>
    <group>
        <a />
        <b />
        <c />
    </group>
</results>

和代码:

import xml.etree.ElementTree as ET

def showGroup(group,s):
    print(s + '  len=' + str(len(group)))
    print('<group>' )
    for e in group:
        print('   <' + e.tag + '>')
    print('</group>\n')

def processGroup(group):
    for e in group:
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')

tree = ET.parse('x.xml')
root = tree.getroot()

for group in root:
    processGroup(group)

我希望 for 循环按顺序处理元素 .特别是:

I expected the for loop to process elements <a>, <b>, and <c> in order. In particular:

  1. 处理 不应删除任何元素
  2. 处理 应该删除
  3. 处理 应该删除
  1. processing <a> should not remove any element
  2. processing <b> should remove <b>
  3. processing <c> should remove <c>

我希望生成的树在 ( 元素)中有一个元素,并且 len(group) 将返回 1.

I expected the resulting tree to have a single element inside <group> (the <a> element), and that len(group) would return 1.

相反,在处理完 之后,for 循环决定结束测试已经满足,并且它不处理元素 .如果是这样, 将被删除.相反,我留下一棵树,其中包含元素 ,并且 len(group) 返回 2.

Instead, after processing <b>, the for loop decides the end test has been met, and it does not process element <c>. If it did, <c> would be removed. Instead, I am left with a tree with elements <a> and <c>, and len(group) returns 2.

更新:如果删除元素后没有代码,一个丑陋的黑客会以一些效率为代价修复"这个问题.但是在我的真实程序中,修剪循环之后的代码很多.

Update: an ugly hack "fixes" the problem at the cost of some efficiency, if there is no code after removing the element. But in my real program, there is a lot of code after the pruning loop.

for e in group:
    if e.tag != 'a':
        group.remove(e)
        showGroup(group,'removed <' + e.tag + '>')
        processGroup(group)

我假设如果 for 循环被中断,那么从开头的组重新开始可能会解决问题.递归是一种整洁的方法 - 以重新处理所有已检查但未删除的元素为代价.

I assume that if the for loop is disrupted, then starting again with the group at the beginning might solve the problem. Recursion is a tidy way of doing that - at the expense of reprocessing all elements that have already been checked but not removed.

我对这个解决方案不满意.

I am not satisfied with this solution.

推荐答案

问题是您正在从迭代的内容中删除元素,当您删除一个元素时,剩余的元素会移动,因此您最终可以删除不正确的元素:

The issue is you are removing elements from something you are iterating over, when you remove an element the remaining elements get shifted so you can end up removing the incorrect elements:

一个简单的解决方案是迭代树的副本或使用reversed:

A simple solution is to iterate over a copy of the tree or use reversed:

复制:

 def processGroup(group):
    # creates a shallow copy so we are removing from the original
    # but iterating over a copy. 
    for e in group[:]:
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')

反转:

def processGroup(group):
    # starts at the end, as the container shrinks.
    # when an element is removed, we still see
    # elements at the same position when we started out loop.
    for e in reversed(group):
        if e.tag != 'a':
            group.remove(e)
            showGroup(group,'removed <' + e.tag + '>')

使用复制逻辑:

In [7]: tree = ET.parse('test.xml')

In [8]: root = tree.getroot()

In [9]: for group in root:
   ...:         processGroup(group)
   ...:     
removed <b>  len=2
<group>
   <a>
   <c>
</group>

removed <c>  len=1
<group>
   <a>
</group>

您也可以使用 ET.tostring 代替 for 循环:

You can also use ET.tostring in place of your for loop:

import xml.etree.ElementTree as ET

def show_group(group,s):
    print(s + '  len=' + str(len(group)))
    print(ET.tostring(group))


def process_group(group):
    for e in group[:]:
        if e.tag != 'a':
            group.remove(e)
            show_group(group, 'removed <' + e.tag + '>')

tree = ET.parse('test.xml')
root = tree.getroot()

for group in root.findall(".//group"):
    process_group(group)

这篇关于从已解析的 XML 树中删除元素会中断迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆