Python minidom/xml:如何使用 minidom api 设置节点文本 [英] Python minidom/xml : How to set node text with minidom api

查看:51
本文介绍了Python minidom/xml:如何使用 minidom api 设置节点文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试加载一个 xml 文件并修改一对 xml 标签内的文本,如下所示:

I am currently trying to load an xml file and modify the text inside a pair of xml tags, like this:

   <anode>sometext</anode>

我目前有一个名为 getText 的辅助函数,我用它来获取上面的文本 sometext.现在我需要修改 childnodes 我想,在节点内部修改具有上面显示的 XML 片段的节点,将 sometext 更改为 othertext.常见的 API 补丁 getText 函数如下脚注所示.

I currently have a helper function called getText that I use to get the text sometext above. Now I need to modify the childnodes I guess, inside the node to modify a node that has the XML snippet shown above, to change sometext to othertext. The common API patch getText function is shown below in the footnote.

所以我的问题是,这就是我们如何获取文本,我该如何编写一个名为 setText(node,'newtext') 的辅助辅助函数.我更喜欢它在节点级别上运行,并自行找到通往子节点的路径,并且运行稳健.

So my question is, that's how we get the text, how do I write a companion helper function called setText(node,'newtext'). I'd prefer if it operated on the node level, and found its way down to the childnodes all on its own, and worked robustly.

上一个问题有一个公认的答案,上面写着我不确定你可以就地修改DOM".真的是这样吗?Minidom 被破坏到它实际上是只读的吗?

A previous question has an accepted answer that says "I'm not sure you can modify the DOM in place". Is that really true? Is Minidom so broken that it's effectively Read Only?

通过脚注,读取 之间的文本,我很惊讶不存在直接简单的单个 minidom 函数,并且Python xml 教程中建议使用此小辅助函数:

By way of footnote, to read text between <anode> and </anode>, I took was surprised no direct simple single minidom function exists, and that this small helper function is suggested in the Python xml tutorials:

import xml.dom.minidom

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)

# I've added this bit to make usage of the above clearer
def getTextFromNode(node):
   return getText(node.childNodes)

其他地方在 StackOverflow 中,我看到了 2008 年的这个公认答案:

Elsewhere in StackOverflow, I see this accepted answer from 2008:

   node[0].firstChild.nodeValue

如果这就是使用 minidom 阅读的难度,看到人们说不要这样做!"我并不感到惊讶.当您询问如何编写可能会修改 XML 文档的 Node 结构的内容时.

If that's how hard it is to read with minidom, I'm not suprised to see that people say "Just don't do it!" when you ask how to write things that might modify the Node structure of your XML document.

更新下面的答案表明它并不像我想象的那么难.

Update The answer below shows it's not as hard as I thought.

推荐答案

其实minidom并不比其他dom解析器难用,如果不喜欢可以考虑向w3c投诉

actually minidom is no more difficult to use than other dom parsers, if you dont like it you may want to consider complaining to the w3c

from xml.dom.minidom import parseString

XML = """
<nodeA>
    <nodeB>Text hello</nodeB>
    <nodeC><noText></noText></nodeC>
</nodeA>
"""


def replaceText(node, newText):
    if node.firstChild.nodeType != node.TEXT_NODE:
        raise Exception("node does not contain text")

    node.firstChild.replaceWholeText(newText)

def main():
    doc = parseString(XML)

    node = doc.getElementsByTagName('nodeB')[0]
    replaceText(node, "Hello World")

    print doc.toxml()

    try:
        node = doc.getElementsByTagName('nodeC')[0]
        replaceText(node, "Hello World")
    except:
        print "error"


if __name__ == '__main__':
    main()

这篇关于Python minidom/xml:如何使用 minidom api 设置节点文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆