子类化 ElementTree 解析器以保留注释 [英] Subclassing ElementTree parser to retain comments

查看:38
本文介绍了子类化 ElementTree 解析器以保留注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试使用ElementTree解析xml文件;由于默认情况下解析器不保留注释,因此使用了来自 http://bugs.python.org/issue8277<的以下代码/a>:

Trying to use the ElementTree to parse xml files; since by default the parser does not retain comments, used the following code from http://bugs.python.org/issue8277:

import xml.etree.ElementTree as etree

class CommentedTreeBuilder(etree.TreeBuilder):
    """A TreeBuilder subclass that retains comments."""

    def comment(self, data):
        self.start(etree.Comment, {})
        self.data(data)
        self.end(etree.Comment)

parser = etree.XMLParser(target = CommentedTreeBuilder())

以上在documents.py中.测试:

The above is in documents.py. Tested with:

class TestDocument(unittest.TestCase):

    def setUp(self):
        filename = os.path.join(sys.path[0], "data", "facilities.xml")
        self.doc = etree.parse(filename, parser = documents.parser)

    def testClass(self):
        print("Class is {0}.".format(self.doc.__class__.__name__))
        #commented out tests.

if __name__ == '__main__':
    unittest.main()

这会引起:

Traceback (most recent call last):
File "/home/goncalo/documents/games/ja2/modding/mods/xml-overhaul/src/scripts/../tests/test_documents.py", line 24, in setUp
    self.doc = etree.parse(filename, parser = documents.parser)
File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 1242, in parse
    tree.parse(source, parser)
File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 1726, in parse
    parser.feed(data)
IndexError: pop from empty stack

我做错了什么?顺便说一句,文件中的 xml 是有效的(由独立程序检查)并且采用 utf-8 编码.

What am I doing wrong? By the way, the xml in the file is valid (as checked by an independent program) and in utf-8 encoding.

注意事项:

  • 使用 Python 3.3.在 Kubuntu 13.04 中,以防万一.我确保使用python3"(而不仅仅是python")来运行测试脚本.

这里是使用的示例xml文件;它非常小(让我们看看我是否可以正确设置格式):

edit: here is the sample xml file used; it is very small (let's see if I can get the formatting right):

<?xml version="1.0" encoding="utf-8"?>
<!-- changes to facilities.xml by G. Rodrigues: ar overhaul.-->
<SECTORFACILITIES>
    <!-- Drassen -->
    <!-- Small airport -->
    <FACILITY>
        <SectorGrid>B13</SectorGrid>
        <FacilityType>4</FacilityType>
        <ubHidden>0</ubHidden>
    </FACILITY>
</SECTORFACILITIES>

推荐答案

您添加的示例 XML 在 2.7 中对我有用,但在 3.3 中因您描述的堆栈跟踪中断.

The example XML you added works for me in 2.7, but breaks on 3.3 with the stack trace you described.

问题似乎是第一个注释——在 XML 声明之后,在第一个元素之前.它不是 2.7 中树的一部分(虽然不会引发异常),但会导致 3.3 中的异常.

The problem seems to be the very first comment - after the XML declaration, before the first element. It isn't part of the tree in 2.7 (doesn't raise an Exception though), and causes the exception in 3.3.

请参阅 Python 问题 #17901:在包含上述修复的 Python 3.4 中,popfrom empty stack 不会发生,但 ParseError: multiple elements on top level 反而被引发.

See Python issue #17901: In Python 3.4, which contains the mentioned fix, pop from empty stack doesn't occur, but ParseError: multiple elements on top level is raised instead.

这是有道理的:如果您想保留树中的评论,则需要将它们视为节点.而且 XML 只允许在文档的顶层有一个节点,所以在第一个真实"元素之前不能有注释(如果你强制解析器保留注释).

Which makes sense: If you want to retain the comments in the tree, they need to be trated as nodes. And XML only allows one node at the top level of the document, so you can't have a comment before the first "real" element (if you force the parser to retain commments).

不幸的是,我认为这是您唯一的选择:从您的 XML 文件中删除根文档节点之外的那些注释 - 无论是在原始文件中,还是在解析之前剥离它们.

So unfortunately I think that's your only option: Remove those comments outside the root document node from your XML files - either in the original files, or by stripping them before parsing.

这篇关于子类化 ElementTree 解析器以保留注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆