忠实地保留解析的XML中的注释 [英] Faithfully Preserve Comments in Parsed XML
问题描述
在处理XML时,我希望尽可能忠实地保留注释。
I'd like to preserve comments as faithfully as possible while manipulating XML.
我设法保留注释,但是内容已转义为XML。
I managed to preserve comments, but the contents are getting XML-escaped.
#!/usr/bin/env python
# add_host_to_tomcat.py
import xml.etree.ElementTree as ET
from CommentedTreeBuilder import CommentedTreeBuilder
parser = CommentedTreeBuilder()
if __name__ == '__main__':
filename = "/opt/lucee/tomcat/conf/server.xml"
# this is the important part: use the comment-preserving parser
tree = ET.parse(filename, parser)
# get the node to add a child to
engine_node = tree.find("./Service/Engine")
# add a node: Engine.Host
host_node = ET.SubElement(
engine_node,
"Host",
name="local.mysite.com",
appBase="webapps"
)
# add a child to new node: Engine.Host.Context
ET.SubElement(
host_node,
'Context',
path="",
docBase="/path/to/doc/base"
)
tree.write('out.xml')
#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree
class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
def __init__ ( self, html = 0, target = None ):
ElementTree.XMLTreeBuilder.__init__( self, html, target )
self._parser.CommentHandler = self.handle_comment
def handle_comment ( self, data ):
self._target.start( ElementTree.Comment, {} )
self._target.data( data )
self._target.end( ElementTree.Comment )
像这样的注释:
<!--
EXAMPLE HOST ENTRY:
<Host name="lucee.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.lucee.org</Alias>
<Alias>my.lucee.org</Alias>
</Host>
HOST ENTRY TEMPLATE:
<Host name="[ENTER DOMAIN NAME]" appBase="webapps">
<Context path="" docBase="[ENTER SYSTEM PATH]" />
<Alias>[ENTER DOMAIN ALIAS]</Alias>
</Host>
-->
最终显示为:
<!--
EXAMPLE HOST ENTRY:
<Host name="lucee.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.lucee.org</Alias>
<Alias>my.lucee.org</Alias>
</Host>
HOST ENTRY TEMPLATE:
<Host name="[ENTER DOMAIN NAME]" appBase="webapps">
<Context path="" docBase="[ENTER SYSTEM PATH]" />
<Alias>[ENTER DOMAIN ALIAS]</Alias>
</Host>
-->
我还尝试了 self._target.data(saxutils.unescape(data) )
(位于 CommentedTreeBuilder.py
中),但似乎没有任何作用。实际上,我认为问题发生在 handle_commment()
步骤之后的某个地方。
I also tried self._target.data( saxutils.unescape(data) )
in CommentedTreeBuilder.py
, but it didn't seem to do anything. In fact, I think the problem happens somewhere after the handle_commment()
step.
这个问题类似于此。
推荐答案
经过Python 2.7和3.5的测试,以下代码应该可以正常工作。
Tested with Python 2.7 and 3.5, the following code should work as intended.
#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree
class CommentedTreeBuilder(ElementTree.TreeBuilder):
def comment(self, data):
self.start(ElementTree.Comment, {})
self.data(data)
self.end(ElementTree.Comment)
然后,在主代码中使用
parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
作为解析器,而不是当前解析器。
as the parser instead of the current one.
顺便说一句,注释可以在b中正确地工作牛与 lxml
。也就是说,您可以
By the way, comments work correctly out of the box with lxml
. That is, you can just do
import lxml.etree as ET
tree = ET.parse(filename)
而无需上述任何内容。
这篇关于忠实地保留解析的XML中的注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!