防止lxml创建自动关闭标签 [英] Keep lxml from creating self-closing tags

查看:93
本文介绍了防止lxml创建自动关闭标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个(旧的)工具,它不能理解诸如< STATUS/> 之类的自动关闭标签.因此,我们需要使用以下打开/关闭标签对XML文件进行序列化:< STATUS></STATUS> .

I have a (old) tool which does not understand self-closing tags like <STATUS/>. So, we need to serialize our XML files with opened/closed tags like this: <STATUS></STATUS>.

当前,我有:

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS/>.</ERROR>'

如何使用打开/关闭的标签进行序列化?

How can I serialize with opened/closed tags?

<ERROR>The status is <STATUS></STATUS>.</ERROR>

解决方案

wildwilhelm 提供,下方:

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> for status_elem in tree.xpath("//STATUS[string() = '']"):
...     status_elem.text = ""
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

推荐答案

似乎给< STATUS> 标记分配了 None的 text 属性:

It seems like the <STATUS> tag gets assigned a text attribute of None:

>>> tree[0]
<Element STATUS at 0x11708d4d0>
>>> tree[0].text
>>> tree[0].text is None
True

如果将< STATUS> 标记的 text 属性设置为空字符串,则应该获得所需的内容:

If you set the text attribute of the <STATUS> tag to an empty string, you should get what you're looking for:

>>> tree[0].text = ''
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

有了这个主意,您可能可以在编写XML之前遍历DOM树并修复 text 属性.像这样:

With this is mind, you can probably walk a DOM tree and fix up text attributes before writing out your XML. Something like this:

# prevent creation of self-closing tags
for node in tree.iter():
    if node.text is None:
        node.text = ''

这篇关于防止lxml创建自动关闭标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆