忠实地保留已解析 XML 中的注释 [英] Faithfully Preserve Comments in Parsed XML
问题描述
我希望在操作 XML 时尽可能忠实地保留注释.
I'd like to preserve comments as faithfully as possible while manipulating XML.
我设法保留了注释,但内容正在被 XML 转义.
I managed to preserve comments, but the contents are getting XML-escaped.
#!/usr/bin/env python
# add_host_to_tomcat.py
import xml.etree.ElementTree as ET
from CommentedTreeBuilder import CommentedTreeBuilder
parser = CommentedTreeBuilder()
if __name__ == '__main__':
filename = "/opt/lucee/tomcat/conf/server.xml"
# this is the important part: use the comment-preserving parser
tree = ET.parse(filename, parser)
# get the node to add a child to
engine_node = tree.find("./Service/Engine")
# add a node: Engine.Host
host_node = ET.SubElement(
engine_node,
"Host",
name="local.mysite.com",
appBase="webapps"
)
# add a child to new node: Engine.Host.Context
ET.SubElement(
host_node,
'Context',
path="",
docBase="/path/to/doc/base"
)
tree.write('out.xml')
#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree
class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
def __init__ ( self, html = 0, target = None ):
ElementTree.XMLTreeBuilder.__init__( self, html, target )
self._parser.CommentHandler = self.handle_comment
def handle_comment ( self, data ):
self._target.start( ElementTree.Comment, {} )
self._target.data( data )
self._target.end( ElementTree.Comment )
然而,评论如下:
<!--
EXAMPLE HOST ENTRY:
<Host name="lucee.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.lucee.org</Alias>
<Alias>my.lucee.org</Alias>
</Host>
HOST ENTRY TEMPLATE:
<Host name="[ENTER DOMAIN NAME]" appBase="webapps">
<Context path="" docBase="[ENTER SYSTEM PATH]" />
<Alias>[ENTER DOMAIN ALIAS]</Alias>
</Host>
-->
最终为:
<!--
EXAMPLE HOST ENTRY:
<Host name="lucee.org" appBase="webapps">
<Context path="" docBase="/var/sites/getrailo.org" />
<Alias>www.lucee.org</Alias>
<Alias>my.lucee.org</Alias>
</Host>
HOST ENTRY TEMPLATE:
<Host name="[ENTER DOMAIN NAME]" appBase="webapps">
<Context path="" docBase="[ENTER SYSTEM PATH]" />
<Alias>[ENTER DOMAIN ALIAS]</Alias>
</Host>
-->
我也在 CommentedTreeBuilder.py
中尝试了 self._target.data( saxutils.unescape(data) )
,但它似乎没有做任何事情.事实上,我认为问题发生在 handle_commment()
步骤之后.
I also tried self._target.data( saxutils.unescape(data) )
in CommentedTreeBuilder.py
, but it didn't seem to do anything. In fact, I think the problem happens somewhere after the handle_commment()
step.
顺便说一句,这个问题类似于这个.
By the way, this question is similar to this.
推荐答案
使用 Python 2.7 和 3.5 测试,以下代码应该可以正常工作.
Tested with Python 2.7 and 3.5, the following code should work as intended.
#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree
class CommentedTreeBuilder(ElementTree.TreeBuilder):
def comment(self, data):
self.start(ElementTree.Comment, {})
self.data(data)
self.end(ElementTree.Comment)
然后,在主代码中使用
parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
作为解析器而不是当前的解析器.
as the parser instead of the current one.
顺便说一句,注释可以使用 lxml
开箱即用.也就是说,你可以做
By the way, comments work correctly out of the box with lxml
. That is, you can just do
import lxml.etree as ET
tree = ET.parse(filename)
不需要以上任何一项.
这篇关于忠实地保留已解析 XML 中的注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!