忠实地保留已解析 XML 中的注释 [英] Faithfully Preserve Comments in Parsed XML

查看:22
本文介绍了忠实地保留已解析 XML 中的注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在操作 XML 时尽可能忠实地保留注释.

I'd like to preserve comments as faithfully as possible while manipulating XML.

我设法保留了注释,但内容正在被 XML 转义.

I managed to preserve comments, but the contents are getting XML-escaped.

#!/usr/bin/env python
# add_host_to_tomcat.py

import xml.etree.ElementTree as ET
from CommentedTreeBuilder import CommentedTreeBuilder
parser = CommentedTreeBuilder()

if __name__ == '__main__':
    filename = "/opt/lucee/tomcat/conf/server.xml"

    # this is the important part: use the comment-preserving parser
    tree = ET.parse(filename, parser)

    # get the node to add a child to
    engine_node = tree.find("./Service/Engine")

    # add a node: Engine.Host
    host_node = ET.SubElement(
        engine_node,
        "Host",
        name="local.mysite.com",
        appBase="webapps"
    )
    # add a child to new node: Engine.Host.Context
    ET.SubElement(
        host_node,
        'Context',
        path="",
        docBase="/path/to/doc/base"
    )

    tree.write('out.xml')

#!/usr/bin/env python
# CommentedTreeBuilder.py

from xml.etree import ElementTree

class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
    def __init__ ( self, html = 0, target = None ):
        ElementTree.XMLTreeBuilder.__init__( self, html, target )
        self._parser.CommentHandler = self.handle_comment

    def handle_comment ( self, data ):
        self._target.start( ElementTree.Comment, {} )
        self._target.data( data )
        self._target.end( ElementTree.Comment )

然而,评论如下:

  <!--
EXAMPLE HOST ENTRY:
    <Host name="lucee.org" appBase="webapps">
         <Context path="" docBase="/var/sites/getrailo.org" />
     <Alias>www.lucee.org</Alias>
     <Alias>my.lucee.org</Alias>
    </Host>

HOST ENTRY TEMPLATE:
    <Host name="[ENTER DOMAIN NAME]" appBase="webapps">
         <Context path="" docBase="[ENTER SYSTEM PATH]" />
     <Alias>[ENTER DOMAIN ALIAS]</Alias>
    </Host>
  -->

最终为:

  <!--
            EXAMPLE HOST ENTRY:
    &lt;Host name="lucee.org" appBase="webapps"&gt;
         &lt;Context path="" docBase="/var/sites/getrailo.org" /&gt;
         &lt;Alias&gt;www.lucee.org&lt;/Alias&gt;
         &lt;Alias&gt;my.lucee.org&lt;/Alias&gt;
    &lt;/Host&gt;

    HOST ENTRY TEMPLATE:
    &lt;Host name="[ENTER DOMAIN NAME]" appBase="webapps"&gt;
         &lt;Context path="" docBase="[ENTER SYSTEM PATH]" /&gt;
         &lt;Alias&gt;[ENTER DOMAIN ALIAS]&lt;/Alias&gt;
    &lt;/Host&gt;
   -->

我也在 CommentedTreeBuilder.py 中尝试了 self._target.data( saxutils.unescape(data) ),但它似乎没有做任何事情.事实上,我认为问题发生在 handle_commment() 步骤之后.

I also tried self._target.data( saxutils.unescape(data) ) in CommentedTreeBuilder.py, but it didn't seem to do anything. In fact, I think the problem happens somewhere after the handle_commment() step.

顺便说一句,这个问题类似于这个.

By the way, this question is similar to this.

推荐答案

使用 Python 2.7 和 3.5 测试,以下代码应该可以正常工作.

Tested with Python 2.7 and 3.5, the following code should work as intended.

#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree

class CommentedTreeBuilder(ElementTree.TreeBuilder):
    def comment(self, data):
        self.start(ElementTree.Comment, {})
        self.data(data)
        self.end(ElementTree.Comment)

然后,在主代码中使用

parser = ElementTree.XMLParser(target=CommentedTreeBuilder())

作为解析器而不是当前的解析器.

as the parser instead of the current one.

顺便说一句,注释可以使用 lxml 开箱即用.也就是说,你可以做

By the way, comments work correctly out of the box with lxml. That is, you can just do

import lxml.etree as ET
tree = ET.parse(filename)

不需要以上任何一项.

这篇关于忠实地保留已解析 XML 中的注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆