XML漂亮打印在Python lxml中失败 [英] XML pretty print fails in Python lxml

查看:78
本文介绍了XML漂亮打印在Python lxml中失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Python 2.7.6中使用lxml 4.1.1读取,修改和写入XML文件.

I am trying to read, modify, and write an XML file with lxml 4.1.1 in Python 2.7.6.

我的代码:

import lxml.etree as et

fn_xml_in = 'in.xml'
parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse(fn_xml_in, parser)
xml_doc.getroot().find('b').append(et.Element('c'))
xml_doc.write('out.xml', method='html', pretty_print=True)

输入文件in.xml看起来像这样:

The input file in.xml looks like this:

<a>
    <b/>
</a>

以及生成的输出文件out.xml:

<a>
    <b><c></c></b>
</a>

或者当我设置remove_blank_text=True时:

<a><b><c></c></b></a>

我希望lxml在b元素内插入换行符和缩进:

I would have expected lxml to insert line breaks and indentation within the b element:

<a>
    <b>
        <c></c>
    </b>
</a>

我该如何实现?

我尝试了一些tidy lib包装器,但是它们似乎专门针对HTML而不是XML.

I have tried some tidy lib wrappers, but they seem to specialize on HTML rather than XML.

我也试图添加换行符作为btail,但是即使缩进也被破坏了.

I have also tried to add newline characters as b's tail, but then even the indentation is broken.

我需要c元素在开始和结束标记:<c></c>中保持分隔.这就是为什么我在示例中使用method='HTML'的原因.

I need the c element to remain separated in an opening and a closing tag: <c></c>. This is why I use method='HTML' in the example.

推荐答案

在编写时使用"xml"输出方法(这是默认设置,因此不必明确给出).

Use the "xml" output method when writing (that's the default so it does not have to be given explicitly).

c元素的text属性设置为空字符串,以确保将该元素序列化为<c></c>.

Set the text property of the c element to an empty string to ensure that the element gets serialized as <c></c>.

代码:

import lxml.etree as et

parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse('in.xml', parser)

b = xml_doc.getroot().find('b')
c = et.Element('c')
c.text=''
b.append(c)

xml_doc.write('out.xml', pretty_print=True)

结果(out.xml):

Result (out.xml):

<a>
  <b>
    <c></c>
  </b>
</a>

这篇关于XML漂亮打印在Python lxml中失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆