使用python xml.sax解析XML实体 [英] Parsing XML Entity with python xml.sax

查看:390
本文介绍了使用python xml.sax解析XML实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用xml.sax用python解析XML,但是我的代码无法捕获实体.为什么在以下情况中不跳过skipEntity()或resolveEntity()报告:

Parsing XML with python using xml.sax, but my code fails to catch Entities. Why doesn't skippedEntity() or resolveEntity() report in the following:

import os
import cStringIO
import xml.sax
from xml.sax.handler import ContentHandler,EntityResolver,DTDHandler

#Class to parse and run test XML files
class TestHandler(ContentHandler,EntityResolver,DTDHandler):

    #SAX handler - Entity resolver
    def resolveEntity(self,publicID,systemID):
        print "TestHandler.resolveEntity: %s  %s" % (publicID,systemID)

    def skippedEntity(self, name):
        print "TestHandler.skippedEntity: %s" % (name)

    def unparsedEntityDecl(self,publicID,systemID,ndata):
        print "TestHandler.unparsedEntityDecl: %s  %s" % (publicID,systemID)

    def startElement(self,name,attrs):
        # name = string.lower(name)
        summary = '' + attrs.get('summary','')
        arg = '' + attrs.get('arg','')
        print 'TestHandler.startElement(), %s : %s (%s)' % (name,summary,arg)


def run(xml_string):
    try:
        parser = xml.sax.make_parser()
        stream = cStringIO.StringIO(xml_string)

        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setDTDHandler( curHandler )
        parser.setEntityResolver( curHandler )

        parser.parse(stream)
        stream.close()
    except (xml.sax.SAXParseException), e:
        print "*** PARSER error: %s" % e;

def main():
    try:
        XML = "<!DOCTYPE page[ <!ENTITY num 'foo'> ]><test summary='step: &num;'>Entity: &not;</test>"
        run(XML)
    except Exception, e:
      print 'FATAL ERROR: %s' % (str(e))

if __name__== '__main__':
    main()

运行时,我看到的只是:

When run, all I see is:

 TestHandler.startElement(), step: foo ()
 *** PARSER error: <unknown>:1:36: undefined entity

为什么我看不到##的resolveEntity打印?或跳过的条目打印为& not ??

Why don't I see the resolveEntity print for &num; or the skipped entry print for &not;?

推荐答案

我认为只能为外部DTD调用resolveEntity和skippedEntity.我可以通过修改XML来使其工作.

I think resolveEntity and skippedEntity are only called for external DTDs. I got this to work by modifying the XML.

XML = """<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE test SYSTEM "external.dtd" >
<test summary='step: &foo; &bar;'>Entity: &not;</test>
"""

external.dtd 包含两个简单的实体声明.

The external.dtd contains two simple entity declarations.

<!ENTITY foo "bar">
<!ENTITY bar "foo">

此外,我摆脱了resolveEntity.

Also, I got rid of resolveEntity.

此输出-

TestHandler.startElement(), test : step: bar foo ()
TestHandler.skippedEntity: not

希望这会有所帮助.

这篇关于使用python xml.sax解析XML实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆