用python进行xml过滤 [英] xml filtering with python

查看:307
本文介绍了用python进行xml过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下xml文档:

<node0>
    <node1>
      <node2 a1="x1"> ... </node2>
      <node2 a1="x2"> ... </node2>
      <node2 a1="x1"> ... </node2>
    </node1>
</node0>

我想过滤 node2 a1 = x2 。用户提供需要测试和过滤掉的xpath和属性值。我看了一些Python的解决方案,例如BeautifulSoup,但它们太复杂了,无法保留文本大小写。我希望保持文档与以前相同的状态,并过滤掉一些内容。

I want to filter out node2 when a1="x2". The user provides the xpath and attribute values that need to tested and filtered out. I looked at some solutions in python like BeautifulSoup but they are too complicated and dont preserve the case of text. I want to keep the document same as before with some stuff filtered out.

您能推荐一个简单而简洁的解决方案吗?从外观上应该不会太复杂。实际的xml文档并不像上面那么简单,但是思想是相同的。

Can you recommend a simple and succinct solution? This should not be too complicated from the looks of it. The actual xml document is not as simple as above but idea is the same.

推荐答案

此方法使用 xml .etree.ElementTree 在标准库中:

This uses xml.etree.ElementTree which is in the standard library:

import xml.etree.ElementTree as xee
data='''\
<node1>
  <node2 a1="x1"> ... </node2>
  <node2 a1="x2"> ... </node2>
  <node2 a1="x1"> ... </node2>
</node1>
'''
doc=xee.fromstring(data)

for tag in doc.findall('node2'):
    if tag.attrib['a1']=='x2':
        doc.remove(tag)
print(xee.tostring(doc))
# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

这使用了 lxml 标准库,但具有更强大的语法

This uses lxml, which is not in the standard library, but has a more powerful syntax:

import lxml.etree
data='''\
<node1>
  <node2 a1="x1"> ... </node2>
  <node2 a1="x2"> ... </node2>
  <node2 a1="x1"> ... </node2>
</node1>
'''
doc = lxml.etree.XML(data)
e=doc.find('node2/[@a1="x2"]')
doc.remove(e)
print(lxml.etree.tostring(doc))

# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

编辑:如果 node2 被更深地埋在xml中,然后您可以遍历所有标签,检查每个父标签以查看 node2 元素是否为其子元素之一,以及如果是这样,则将其删除:

If node2 is buried more deeply in the xml, then you can iterate through all the tags, check each parent tag to see if the node2 element is one of its children, and the remove it if so:

仅使用xml.etree.ElementTree:

Using only xml.etree.ElementTree:

doc=xee.fromstring(data)
for parent in doc.getiterator():
    for child in parent.findall('node2'):
        if child.attrib['a1']=='x2':
            parent.remove(child)

使用lxml :

doc = lxml.etree.XML(data)
for parent in doc.iter('*'):
    child=parent.find('node2/[@a1="x2"]')
    if child is not None:
        parent.remove(child)

这篇关于用python进行xml过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆