用python进行xml过滤 [英] xml filtering with python
问题描述
我有以下xml文档:
<node0>
<node1>
<node2 a1="x1"> ... </node2>
<node2 a1="x2"> ... </node2>
<node2 a1="x1"> ... </node2>
</node1>
</node0>
我想过滤 node2
a1 = x2
。用户提供需要测试和过滤掉的xpath和属性值。我看了一些Python的解决方案,例如BeautifulSoup,但它们太复杂了,无法保留文本大小写。我希望保持文档与以前相同的状态,并过滤掉一些内容。
I want to filter out node2
when a1="x2"
. The user provides the xpath and attribute values that need to tested and filtered out. I looked at some solutions in python like BeautifulSoup but they are too complicated and dont preserve the case of text. I want to keep the document same as before with some stuff filtered out.
您能推荐一个简单而简洁的解决方案吗?从外观上应该不会太复杂。实际的xml文档并不像上面那么简单,但是思想是相同的。
Can you recommend a simple and succinct solution? This should not be too complicated from the looks of it. The actual xml document is not as simple as above but idea is the same.
推荐答案
此方法使用 xml .etree.ElementTree
在标准库中:
This uses xml.etree.ElementTree
which is in the standard library:
import xml.etree.ElementTree as xee
data='''\
<node1>
<node2 a1="x1"> ... </node2>
<node2 a1="x2"> ... </node2>
<node2 a1="x1"> ... </node2>
</node1>
'''
doc=xee.fromstring(data)
for tag in doc.findall('node2'):
if tag.attrib['a1']=='x2':
doc.remove(tag)
print(xee.tostring(doc))
# <node1>
# <node2 a1="x1"> ... </node2>
# <node2 a1="x1"> ... </node2>
# </node1>
这使用了 lxml
标准库,但具有更强大的语法:
This uses lxml
, which is not in the standard library, but has a more powerful syntax:
import lxml.etree
data='''\
<node1>
<node2 a1="x1"> ... </node2>
<node2 a1="x2"> ... </node2>
<node2 a1="x1"> ... </node2>
</node1>
'''
doc = lxml.etree.XML(data)
e=doc.find('node2/[@a1="x2"]')
doc.remove(e)
print(lxml.etree.tostring(doc))
# <node1>
# <node2 a1="x1"> ... </node2>
# <node2 a1="x1"> ... </node2>
# </node1>
编辑:如果 node2
被更深地埋在xml中,然后您可以遍历所有标签,检查每个父标签以查看 node2
元素是否为其子元素之一,以及如果是这样,则将其删除:
If node2
is buried more deeply in the xml, then you can iterate through all the tags, check each parent tag to see if the node2
element is one of its children, and the remove it if so:
仅使用xml.etree.ElementTree:
Using only xml.etree.ElementTree:
doc=xee.fromstring(data)
for parent in doc.getiterator():
for child in parent.findall('node2'):
if child.attrib['a1']=='x2':
parent.remove(child)
使用lxml :
doc = lxml.etree.XML(data)
for parent in doc.iter('*'):
child=parent.find('node2/[@a1="x2"]')
if child is not None:
parent.remove(child)
这篇关于用python进行xml过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!