序列化为字符串时,如何防止lxml自动关闭空元素? [英] How can I prevent lxml from auto-closing empty elements when serializing to string?

查看:477
本文介绍了序列化为字符串时,如何防止lxml自动关闭空元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析一个巨大的xml文件,其中包含许多空元素,例如

I am parsing a huge xml file which contains many empty elements such as

<MemoryEnv></MemoryEnv>

使用

etree.tostring(root_element, pretty_print_True)

输出元素折叠到

<MemoryEnv/>

有什么办法可以防止这种情况发生? etree.tostring()没有提供这种功能.

Is there any way to prevent this? the etree.tostring() does not provide such a facility.

有没有办法干扰lxml的tostring()序列化程序?

Is there a way interfere with lxml's tostring() serializer?

顺便说一句,html模块不起作用.它不是为XML设计的,并且 它不会以其原始形式创建空元素.

Btw, the html module does not work. It's not designed for XML, and it does not create empty elements in their original form.

问题是,尽管空元素的折叠形式和未折叠形式是等效的, 解析此文件的程序无法使用折叠的空元素.

The problem is, that although collapsed and uncollapsed forms of an empty element are equivalent, the program that parses this file won't work with collapsed empty elements.

推荐答案

这是一种实现方法.确保所有空元素的text值都不是None.

Here is a way to do it. Ensure that the text value for all empty elements is not None.

示例:

from lxml import etree

XML = """
<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>"""

doc = etree.fromstring(XML)

for elem in doc.iter():
    if elem.text == None:
        elem.text = ''

print etree.tostring(doc)

输出:

<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>


一种替代方法是使用 write_c14n() 方法将规范XML (不使用特殊的空元素语法)写入文件.


An alternative is to use the write_c14n() method to write canonical XML (which does not use the special empty-element syntax) to a file.

from lxml import etree

XML = """
<root>
  <MemoryEnv></MemoryEnv>
  <AlsoEmpty></AlsoEmpty>
  <foo>bar</foo>
</root>"""

doc = etree.fromstring(XML)

doc.getroottree().write_c14n("out.xml")

这篇关于序列化为字符串时,如何防止lxml自动关闭空元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆