来自StringIO源的Python xml etree DTD? [英] Python xml etree DTD from a StringIO source?

查看:110
本文介绍了来自StringIO源的Python xml etree DTD?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在修改以下代码(通过此问题中的建议创建) ,它将XML文件和DTD转换为其他格式.对于此问题,只有加载部分很重要:

I'm adapting the following code (created via advice in this question), that took an XML file and it's DTD and converted them to a different format. For this problem only the loading section is important:

xmldoc = open(filename)

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)    
tree = etree.parse(xmldoc, parser)

在使用文件系统时,此方法工作正常,但我将其转换为通过Web框架运行,在Web框架中,这两个文件是通过表单加载的.

This worked fine, whilst using the file system, but I'm converting it to run via a web framework, where the two files are loaded via a form.

加载xml文件可以正常工作:

Loading the xml file works fine:

tree = etree.parse(StringIO(data['xml_file']) 

但是,由于DTD链接到xml文件的顶部,因此以下语句失败:

But as the DTD is linked to in the top of the xml file, the following statement fails:

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
tree = etree.parse(StringIO(data['xml_file'], parser)

通过此问题 ,我尝试过:

etree.DTD(StringIO(data['dtd_file'])
tree = etree.parse(StringIO(data['xml_file'])

虽然第一行没有引起错误,但第二行落在了DTD打算使用的unicode实体上(并且在文件系统版本中也是如此):

Whilst the first line doesn't cause an error, the second falls over on unicode entities the DTD is meant to pick up (and does so in the file system version):

XMLSyntaxError:实体'eacute'不 定义,第4495行,第46列

XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46

如何正确加载此DTD​​?

How do I go about correctly loading this DTD?

推荐答案

下面是一个简短但完整的示例,其中使用了@Steven的自定义解析器技术.

Here's a short but complete example, using the custom resolver technique @Steven mentioned.

from StringIO import StringIO
from lxml import etree

data = dict(
    xml_file = '''<?xml version="1.0"?>
<!DOCTYPE x SYSTEM "a.dtd">
<x><y>&eacute;zz</y></x>
''',
    dtd_file = '''<!ENTITY eacute "&#233;">
<!ELEMENT x (y)>
<!ELEMENT y (#PCDATA)>
''')

class DTDResolver(etree.Resolver):
     def resolve(self, url, id, context):
         return self.resolve_string(data['dtd_file'], context)

xmldoc = StringIO(data['xml_file'])
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
parser.resolvers.add(DTDResolver())
try:
    tree = etree.parse(xmldoc, parser)
except etree.XMLSyntaxError as e:
    # handle xml and validation errors

这篇关于来自StringIO源的Python xml etree DTD?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆