lxml unicode实体解析问题 [英] lxml unicode entity parse problems

查看:65
本文介绍了lxml unicode实体解析问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在如下使用lxml来解析从另一个系统导出的XML文件:

I'm using lxml as follows to parse an exported XML file from another system:

xmldoc = open(filename)
etree.parse(xmldoc)

但是我得到了

lxml.etree.XMLSyntaxError:实体 未定义紧急"行4495, 第46栏

lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46

很显然,Unicode实体名称存在问题-但是我将如何解决呢?通过open()还是parse()?

Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse()?

编辑:我忘了将DTD包含在同一文件夹中-它现在已经存在,并且具有以下声明:

I had forgotten to include my DTD in the same folder - it's there now and has the following declaration:

<!ENTITY eacute "&#233;">

,并且在xmldoc中被这样引用(并且一直被引用):

and is referred to (and always was) in xmldoc as so:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE DScribeDatabase SYSTEM "foo.dtd">

但是我仍然遇到相同的问题... DTD是否也需要在Python中声明?

Yet I still get the same problem ... does the DTD need to be declared in Python too?

推荐答案

eacute不是XML中的预定义实体.要在XML文件中包含&eacute;实体引用,它必须具有<!DOCTYPE>声明,该声明指向定义该实体的DTD(例如XHTML 1.0 DTD).

eacute is not a predefined entity in XML. To include an &eacute; entity reference in an XML file, it must have a <!DOCTYPE> declaration pointing to a DTD (such as an XHTML 1.0 DTD) that defines the entity.

如果XML使用&eacute;但没有<!DOCTYPE>,则该XML格式不正确,并且导出它的系统也必须固定.

If the XML uses &eacute; but doesn't have a <!DOCTYPE>, it is not well-formed and the system that exported it needs to be fixed.

(没有充分的理由使用实体引用来表示XML文件中的é.字符引用&#233;在没有实体定义的情况下无处不在,如果文件不能简单地包含原始UTF的话-8 é由于某种原因.)

(There isn't a good reason to use an entity reference to represent é in an XML file. The character reference &#233; is understood everywhere without entity definitions, if the file can't simply include a raw UTF-8 é for some reason.)

这篇关于lxml unicode实体解析问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆