如何使用 Python 从 XML 中删除元素 [英] How to remove elements from XML using Python

查看:79
本文介绍了如何使用 Python 从 XML 中删除元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被 XML 和 Python 困住了.任务很简单,但到目前为止我无法解决它并花了很长时间.我来这里是为了寻求如何用几行来解决它的建议.

I got stuck with XML and Python. The task is simple but I couldn't resolve it so far and spent on that long time. I came here for an advice how to solve it with couple of lines.

感谢您对遍历树的任何帮助.我总是以太多或太少的元素结束.元素可以无限制地嵌套.给出的例子只是一个例子.我会接受任何解决方案,不挑剔 dom、minidom、sax 等等.

Thanks for any help with traversing the tree. I always ended up with too many or too few elements. Elements can be nested without limit. Given example is just an example. I will accept any solution, not picky about dom, minidom, sax, whatever..

我有一个与此类似的 XML 文件:

I have an XML file similar to this one:

<root>
    <elm>
        <elm>Common content</elm>

        <elm xmlns="http://example.org/ns">
            <elm lang="en">Content EN</elm>
            <elm lang="cs">žluťoučký koníček</elm>
        </elm>

        <elm xml:id="abc123">Common content</elm>

        <elm lang="en">Content EN</elm>
        <elm lang="cs">Content CS</elm>

        <elm lang="en">
            <elm>Content EN</elm>
            <elm>Content EN</elm>
        </elm>

        <elm lang="cs">
            <elm>Content CS</elm>
            <elm>Content CS</elm>
        </elm>
    </elm>
</root>

我需要什么 - 解析 XML 并编写一个新文件.新文件应包含给定语言的所有元素和没有 lang 属性的元素.

What I need - parse the XML and write a new file. The new file should contain all the elements for given language and elements without lang attribute.

对于cs"语言,输出文件应包含:

For "cs" language the output file should containt this:

<root>
    <elm>
        <elm>Common content</elm>

        <elm xmlns="http://example.org/ns">
            <elm lang="cs">žluťoučký koníček</elm>
        </elm>

        <elm xml:id="abc123">Common content</elm>

        <elm lang="cs">Content CS</elm>

        <elm lang="cs">
            <elm>Content CS</elm>
            <elm>Content CS</elm>
        </elm>
    </elm>
</root>

如果能在新文件中省略lang属性就更好了.但这并不重要.

If you can make it to omit the lang attribute in the new file, even better. But it's not that important.

UPDATE1:添加了 unicode 字符和命名空间属性.

UPDATE1: Added unicode characters and namespace attribute.

UPDATE2:使用 Python 2.5,首选标准库.

UPDATE2: Using Python 2.5, standard libraries preferred.

推荐答案

使用 lxml:

import lxml.etree as le

with open('doc.xml','r') as f:
    doc=le.parse(f)
    for elem in doc.xpath('//*[attribute::lang]'):
        if elem.attrib['lang']=='en':
            elem.attrib.pop('lang')
        else:
            parent=elem.getparent()
            parent.remove(elem)
    print(le.tostring(doc))

收益

<root>
    <elm>Common content</elm>

    <elm>
        <elm>Content EN</elm>
        </elm>

    <elm>Common content</elm>

    <elm>Content EN</elm>
    <elm>
        <elm>Content EN</elm>
        <elm>Content EN</elm>
    </elm>

    </root>

这篇关于如何使用 Python 从 XML 中删除元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆