忠实地保留解析的XML中的注释 [英] Faithfully Preserve Comments in Parsed XML

查看:50
本文介绍了忠实地保留解析的XML中的注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在处理XML时,我希望尽可能忠实地保留注释。

I'd like to preserve comments as faithfully as possible while manipulating XML.

我设法保留注释,但是内容已转义为XML。

I managed to preserve comments, but the contents are getting XML-escaped.

#!/usr/bin/env python
# add_host_to_tomcat.py

import xml.etree.ElementTree as ET
from CommentedTreeBuilder import CommentedTreeBuilder
parser = CommentedTreeBuilder()

if __name__ == '__main__':
    filename = "/opt/lucee/tomcat/conf/server.xml"

    # this is the important part: use the comment-preserving parser
    tree = ET.parse(filename, parser)

    # get the node to add a child to
    engine_node = tree.find("./Service/Engine")

    # add a node: Engine.Host
    host_node = ET.SubElement(
        engine_node,
        "Host",
        name="local.mysite.com",
        appBase="webapps"
    )
    # add a child to new node: Engine.Host.Context
    ET.SubElement(
        host_node,
        'Context',
        path="",
        docBase="/path/to/doc/base"
    )

    tree.write('out.xml')



#!/usr/bin/env python
# CommentedTreeBuilder.py

from xml.etree import ElementTree

class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
    def __init__ ( self, html = 0, target = None ):
        ElementTree.XMLTreeBuilder.__init__( self, html, target )
        self._parser.CommentHandler = self.handle_comment

    def handle_comment ( self, data ):
        self._target.start( ElementTree.Comment, {} )
        self._target.data( data )
        self._target.end( ElementTree.Comment )

像这样的注释:

  <!--
EXAMPLE HOST ENTRY:
    <Host name="lucee.org" appBase="webapps">
         <Context path="" docBase="/var/sites/getrailo.org" />
     <Alias>www.lucee.org</Alias>
     <Alias>my.lucee.org</Alias>
    </Host>

HOST ENTRY TEMPLATE:
    <Host name="[ENTER DOMAIN NAME]" appBase="webapps">
         <Context path="" docBase="[ENTER SYSTEM PATH]" />
     <Alias>[ENTER DOMAIN ALIAS]</Alias>
    </Host>
  -->

最终显示为:

  <!--
            EXAMPLE HOST ENTRY:
    &lt;Host name="lucee.org" appBase="webapps"&gt;
         &lt;Context path="" docBase="/var/sites/getrailo.org" /&gt;
         &lt;Alias&gt;www.lucee.org&lt;/Alias&gt;
         &lt;Alias&gt;my.lucee.org&lt;/Alias&gt;
    &lt;/Host&gt;

    HOST ENTRY TEMPLATE:
    &lt;Host name="[ENTER DOMAIN NAME]" appBase="webapps"&gt;
         &lt;Context path="" docBase="[ENTER SYSTEM PATH]" /&gt;
         &lt;Alias&gt;[ENTER DOMAIN ALIAS]&lt;/Alias&gt;
    &lt;/Host&gt;
   -->

我还尝试了 self._target.data(saxutils.unescape(data) )(位于 CommentedTreeBuilder.py 中),但似乎没有任何作用。实际上,我认为问题发生在 handle_commment()步骤之后的某个地方。

I also tried self._target.data( saxutils.unescape(data) ) in CommentedTreeBuilder.py, but it didn't seem to do anything. In fact, I think the problem happens somewhere after the handle_commment() step.

这个问题类似于

推荐答案

经过Python 2.7和3.5的测试,以下代码应该可以正常工作。

Tested with Python 2.7 and 3.5, the following code should work as intended.

#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree

class CommentedTreeBuilder(ElementTree.TreeBuilder):
    def comment(self, data):
        self.start(ElementTree.Comment, {})
        self.data(data)
        self.end(ElementTree.Comment)

然后,在主代码中使用

parser = ElementTree.XMLParser(target=CommentedTreeBuilder())

作为解析器,而不是当前解析器。

as the parser instead of the current one.

顺便说一句,注释可以在b中正确地工作牛与 lxml 。也就是说,您可以

By the way, comments work correctly out of the box with lxml. That is, you can just do

import lxml.etree as ET
tree = ET.parse(filename)

而无需上述任何内容。

这篇关于忠实地保留解析的XML中的注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆