Python 中的空 XML 元素处理 [英] Empty XML element handling in Python

查看:39
本文介绍了Python 中的空 XML 元素处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 minidom 解析器处理空元素感到困惑,如以下代码部分所示.

import xml.dom.minidomdoc = xml.dom.minidom.parseString('<value></value>')打印 doc.firstChild.nodeValue.__repr__()# 输出:无打印 doc.firstChild.toxml()# 输出:<value/>doc = xml.dom.minidom.Document()v = doc.appendChild(doc.createElement('value'))v.appendChild(doc.createTextNode(''))打印 v.firstChild.nodeValue.__repr__()# 出去: ''打印 doc.firstChild.toxml()# 输出:<value></value>

我怎样才能获得一致的行为?我想接收空字符串作为空元素的值(这我首先放在XML结构中的).

解决方案

破解打开 xml.dom.minidom 并搜索/>",我们发现:

# Element(Node) 类的方法.def writexml(self, writer, indent="", addindent="", newl=""):# [剪辑]如果 self.childNodes:writer.write(">%s"%(newl))对于 self.childNodes 中的节点:node.writexml(writer,indent+addindent,addindent,newl)writer.write("%s</%s>%s" % (indent,self.tagName,newl))别的:writer.write("/>%s"%(newl))

由此我们可以推断出短尾标签形式仅在 childNodes 为空列表时出现.确实,这似乎是真的:

<预><代码>>>>文档 = 文档()>>>v = doc.appendChild(doc.createElement('v'))>>>v.toxml()'<v/>'>>>v.childNodes[]>>>v.appendChild(doc.createTextNode(''))<DOM文本节点''">>>>v.childNodes[<DOM 文本节点''">]>>>v.toxml()'<v></v>'

正如 Lloyd 所指出的,XML 规范对两者没有区别.如果您的代码确实有所区别,则意味着您需要重新考虑如何序列化数据.

xml.dom.minidom 只是显示不同的东西,因为它更容易编码.但是,您可以获得一致的输出.只需继承Element 类并覆盖toxml 方法,这样当没有非空文本内容的子节点时,它会打印出短结束标记表单.然后对模块进行猴子补丁以使用新的 Element 类.

I'm puzzled by minidom parser handling of empty element, as shown in following code section.

import xml.dom.minidom

doc = xml.dom.minidom.parseString('<value></value>')
print doc.firstChild.nodeValue.__repr__()
# Out: None
print doc.firstChild.toxml()
# Out: <value/>

doc = xml.dom.minidom.Document()
v = doc.appendChild(doc.createElement('value'))
v.appendChild(doc.createTextNode(''))
print v.firstChild.nodeValue.__repr__()
# Out: ''
print doc.firstChild.toxml()
# Out: <value></value>

How can I get consistent behavior? I'd like to receive empty string as value of empty element (which IS what I put in XML structure in the first place).

解决方案

Cracking open xml.dom.minidom and searching for "/>", we find this:

# Method of the Element(Node) class.
def writexml(self, writer, indent="", addindent="", newl=""):
    # [snip]
    if self.childNodes:
        writer.write(">%s"%(newl))
        for node in self.childNodes:
            node.writexml(writer,indent+addindent,addindent,newl)
        writer.write("%s</%s>%s" % (indent,self.tagName,newl))
    else:
        writer.write("/>%s"%(newl))

We can deduce from this that the short-end-tag form only occurs when childNodes is an empty list. Indeed, this seems to be true:

>>> doc = Document()
>>> v = doc.appendChild(doc.createElement('v'))
>>> v.toxml()
'<v/>'
>>> v.childNodes
[]
>>> v.appendChild(doc.createTextNode(''))
<DOM Text node "''">
>>> v.childNodes
[<DOM Text node "''">]
>>> v.toxml()
'<v></v>'

As pointed out by Lloyd, the XML spec makes no distinction between the two. If your code does make the distinction, that means you need to rethink how you want to serialize your data.

xml.dom.minidom simply displays something differently because it's easier to code. You can, however, get consistent output. Simply inherit the Element class and override the toxml method such that it will print out the short-end-tag form when there are no child nodes with non-empty text content. Then monkeypatch the module to use your new Element class.

这篇关于Python 中的空 XML 元素处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆