XML漂亮打印在Python lxml中失败 [英] XML pretty print fails in Python lxml
问题描述
我正在尝试在Python 2.7.6中使用lxml 4.1.1读取,修改和写入XML文件.
I am trying to read, modify, and write an XML file with lxml 4.1.1 in Python 2.7.6.
我的代码:
import lxml.etree as et
fn_xml_in = 'in.xml'
parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse(fn_xml_in, parser)
xml_doc.getroot().find('b').append(et.Element('c'))
xml_doc.write('out.xml', method='html', pretty_print=True)
输入文件in.xml
看起来像这样:
The input file in.xml
looks like this:
<a>
<b/>
</a>
以及生成的输出文件out.xml
:
<a>
<b><c></c></b>
</a>
或者当我设置remove_blank_text=True
时:
<a><b><c></c></b></a>
我希望lxml在b
元素内插入换行符和缩进:
I would have expected lxml to insert line breaks and indentation within the b
element:
<a>
<b>
<c></c>
</b>
</a>
我该如何实现?
我尝试了一些tidy
lib包装器,但是它们似乎专门针对HTML而不是XML.
I have tried some tidy
lib wrappers, but they seem to specialize on HTML rather than XML.
我也试图添加换行符作为b
的tail
,但是即使缩进也被破坏了.
I have also tried to add newline characters as b
's tail
, but then even the indentation is broken.
我需要c
元素在开始和结束标记:<c></c>
中保持分隔.这就是为什么我在示例中使用method='HTML'
的原因.
I need the c
element to remain separated in an opening and a closing tag: <c></c>
. This is why I use method='HTML'
in the example.
推荐答案
在编写时使用"xml"输出方法(这是默认设置,因此不必明确给出).
Use the "xml" output method when writing (that's the default so it does not have to be given explicitly).
将c
元素的text
属性设置为空字符串,以确保将该元素序列化为<c></c>
.
Set the text
property of the c
element to an empty string to ensure that the element gets serialized as <c></c>
.
代码:
import lxml.etree as et
parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse('in.xml', parser)
b = xml_doc.getroot().find('b')
c = et.Element('c')
c.text=''
b.append(c)
xml_doc.write('out.xml', pretty_print=True)
结果(out.xml):
Result (out.xml):
<a>
<b>
<c></c>
</b>
</a>
这篇关于XML漂亮打印在Python lxml中失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!