删除XML字符串中的空格 [英] Remove whitespaces in XML string

查看:145
本文介绍了删除XML字符串中的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Python 2.6中删除XML字符串中的空格和换行符?我尝试了以下程序包:

How can I remove the whitespaces and line breaks in an XML string in Python 2.6? I tried the following packages:

etree:此代码段保留了原始空格:

etree: This snippet keeps the original whitespaces:

xmlStr = '''<root>
    <head></head>
    <content></content>
</root>'''

xmlElement = xml.etree.ElementTree.XML(xmlStr)
xmlStr = xml.etree.ElementTree.tostring(xmlElement, 'UTF-8')
print xmlStr

我不能使用Python 2.7来提供 method 参数。

I can not use Python 2.7 which would provide the method parameter.

最小:相同:

xmlDocument = xml.dom.minidom.parseString(xmlStr)
xmlStr = xmlDocument.toprettyxml(indent='', newl='', encoding='UTF-8')


推荐答案

最简单的解决方案可能是使用 lxml ,您可以在其中设置解析器选项以忽略元素之间的空白:

The easiest solution is probably using lxml, where you can set a parser option to ignore white space between elements:

>>> from lxml import etree
>>> parser = etree.XMLParser(remove_blank_text=True)
>>> xml_str = '''<root>
>>>     <head></head>
>>>     <content></content>
>>> </root>'''
>>> elem = etree.XML(xml_str, parser=parser)
>>> print etree.tostring(elem)
<root><head/><content/></root>

这可能足以满足您的需求,但有些警告是安全的:

This will probably be enough for your needs, but some warnings to be on the safe side:

这只会删除元素之间的空白节点,并尽量不要删除内容混合的元素内部的空白节点:

This will just remove whitespace nodes between elements, and try not to remove whitespace nodes inside elements with mixed content:

>>> elem = etree.XML('<p> spam <a>ham</a> <a>eggs</a></p>', parser=parser)
>>> print etree.tostring(elem)
<p> spam <a>ham</a> <a>eggs</a></p>

textnode的前导或尾随空白将不会被删除。但是,在某些情况下,它将仍然从混合内容中删除空白节点:如果解析器尚未在该级别遇到非空白节点。

Leading or trailing whitespace from textnodes will not be removed. It will however still in some circumstances remove whitespace nodes from mixed content: if the parser has not encountered non-whitespace nodes at that level yet.

>>> elem = etree.XML('<p><a> ham</a> <a>eggs</a></p>', parser=parser)
>>> print etree.tostring(elem)
<p><a> ham</a><a>eggs</a></p>

如果您不想这样做,可以使用 xml:space = 保留 ,将受到尊重。另一种选择是使用dtd并使用 etree.XMLParser(load_dtd = True),其中解析器将使用dtd来确定哪些空白节点有效或无效。

If you don't want that, you can use xml:space="preserve", which will be respected. Another option would be using a dtd and use etree.XMLParser(load_dtd=True), where the parser will use the dtd to determine which whitespace nodes are significant or not.

除此之外,您将必须编写自己的代码来删除不需要的空白(迭代后代,并在适当的地方设置。文本 .tail 属性,其中仅包含空格,<< c $ c>无或空字符串)

Other than that, you will have to write your own code to remove the whitespace you don't want (iterating descendants, and where appropriate, set .text and .tail properties that contain only whitespace to None or empty string)

这篇关于删除XML字符串中的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆