Python 中的空 XML 元素处理 [英] Empty XML element handling in Python
问题描述
我对 minidom 解析器处理空元素感到困惑,如以下代码部分所示.
import xml.dom.minidomdoc = xml.dom.minidom.parseString('<value></value>')打印 doc.firstChild.nodeValue.__repr__()# 输出:无打印 doc.firstChild.toxml()# 输出:<value/>doc = xml.dom.minidom.Document()v = doc.appendChild(doc.createElement('value'))v.appendChild(doc.createTextNode(''))打印 v.firstChild.nodeValue.__repr__()# 出去: ''打印 doc.firstChild.toxml()# 输出:<value></value>
我怎样才能获得一致的行为?我想接收空字符串作为空元素的值(这是我首先放在XML结构中的).
破解打开 xml.dom.minidom 并搜索/>",我们发现:
# Element(Node) 类的方法.def writexml(self, writer, indent="", addindent="", newl=""):# [剪辑]如果 self.childNodes:writer.write(">%s"%(newl))对于 self.childNodes 中的节点:node.writexml(writer,indent+addindent,addindent,newl)writer.write("%s</%s>%s" % (indent,self.tagName,newl))别的:writer.write("/>%s"%(newl))
由此我们可以推断出短尾标签形式仅在 childNodes 为空列表时出现.确实,这似乎是真的:
<预><代码>>>>文档 = 文档()>>>v = doc.appendChild(doc.createElement('v'))>>>v.toxml()'<v/>'>>>v.childNodes[]>>>v.appendChild(doc.createTextNode(''))<DOM文本节点''">>>>v.childNodes[<DOM 文本节点''">]>>>v.toxml()'<v></v>'正如 Lloyd 所指出的,XML 规范对两者没有区别.如果您的代码确实有所区别,则意味着您需要重新考虑如何序列化数据.
xml.dom.minidom 只是显示不同的东西,因为它更容易编码.但是,您可以获得一致的输出.只需继承Element
类并覆盖toxml
方法,这样当没有非空文本内容的子节点时,它会打印出短结束标记表单.然后对模块进行猴子补丁以使用新的 Element 类.
I'm puzzled by minidom parser handling of empty element, as shown in following code section.
import xml.dom.minidom
doc = xml.dom.minidom.parseString('<value></value>')
print doc.firstChild.nodeValue.__repr__()
# Out: None
print doc.firstChild.toxml()
# Out: <value/>
doc = xml.dom.minidom.Document()
v = doc.appendChild(doc.createElement('value'))
v.appendChild(doc.createTextNode(''))
print v.firstChild.nodeValue.__repr__()
# Out: ''
print doc.firstChild.toxml()
# Out: <value></value>
How can I get consistent behavior? I'd like to receive empty string as value of empty element (which IS what I put in XML structure in the first place).
Cracking open xml.dom.minidom and searching for "/>", we find this:
# Method of the Element(Node) class.
def writexml(self, writer, indent="", addindent="", newl=""):
# [snip]
if self.childNodes:
writer.write(">%s"%(newl))
for node in self.childNodes:
node.writexml(writer,indent+addindent,addindent,newl)
writer.write("%s</%s>%s" % (indent,self.tagName,newl))
else:
writer.write("/>%s"%(newl))
We can deduce from this that the short-end-tag form only occurs when childNodes is an empty list. Indeed, this seems to be true:
>>> doc = Document()
>>> v = doc.appendChild(doc.createElement('v'))
>>> v.toxml()
'<v/>'
>>> v.childNodes
[]
>>> v.appendChild(doc.createTextNode(''))
<DOM Text node "''">
>>> v.childNodes
[<DOM Text node "''">]
>>> v.toxml()
'<v></v>'
As pointed out by Lloyd, the XML spec makes no distinction between the two. If your code does make the distinction, that means you need to rethink how you want to serialize your data.
xml.dom.minidom simply displays something differently because it's easier to code. You can, however, get consistent output. Simply inherit the Element
class and override the toxml
method such that it will print out the short-end-tag form when there are no child nodes with non-empty text content. Then monkeypatch the module to use your new Element class.
这篇关于Python 中的空 XML 元素处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!