ParseError:在 Python 中解析 XML 文件时未定义实体 [英] ParseError: undefined entity while parsing XML file in Python
问题描述
我有一个包含多个 article
节点的大型 XML 文件.我只包含了一个问题.我尝试在 Python 中解析它以过滤一些数据,但出现错误
I have a big XML file with several article
nodes. I have included only one with the problem. I try to parse it in Python to filter some data and I get the error
File "<string>", line unknown
ParseError: undefined entity Ö: line 90, column 17
XML 文件示例
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<article mdate="2019-10-25" key="tr/gte/TR-0146-06-91-165" publtype="informal">
<author>Alejandro P. Buchmann</author>
<author>M. Tamer Özsu</author>
<author>Dimitrios Georgakopoulos</author>
<title>Towards a Transaction Management System for DOM.</title>
<journal>GTE Laboratories Incorporated</journal>
<volume>TR-0146-06-91-165</volume>
<month>June</month>
<year>1991</year>
<url>db/journals/gtelab/index.html#TR-0146-06-91-165</url>
</article>
</dblp>
从我在谷歌搜索,我发现如果节点名称有问题,就会出现这种错误.但是,出现错误的行是文本中的第二个 author
.
From my search in Google, I found that this kind of error appears if you have issues in the node names. However, the line with the error is the second author
, in the text.
这是我的 Python 代码
This is my Python code
with open('xaa.xml', 'r') as xml_file:
xml_tree = etree.parse(xml_file)
推荐答案
Ouml
实体的声明大概在 DTD (dblp.dtd) 中,但 ElementTree 不支持外部 DTD.ElementTree 仅识别直接在 XML 文件中(在内部子集"中)声明的实体.这是一个工作示例:
The declaration of the Ouml
entity is presumably in the DTD (dblp.dtd), but ElementTree does not support external DTDs. ElementTree only recognizes entities declared directly in the XML file (in the "internal subset"). This is a working example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp [
<!ENTITY Ouml 'Ö'>
]>
<dblp>
<article mdate="2019-10-25" key="tr/gte/TR-0146-06-91-165" publtype="informal">
<author>Alejandro P. Buchmann</author>
<author>M. Tamer Özsu</author>
<author>Dimitrios Georgakopoulos</author>
<title>Towards a Transaction Management System for DOM.</title>
<journal>GTE Laboratories Incorporated</journal>
<volume>TR-0146-06-91-165</volume>
<month>June</month>
<year>1991</year>
<url>db/journals/gtelab/index.html#TR-0146-06-91-165</url>
</article>
</dblp>
要解析问题中的 XML 文件而不会出错,您需要一个支持外部 DTD 的更强大的 XML 库.lxml 是一个不错的选择.
To parse the XML file in the question without errors, you need a more powerful XML library that supports external DTDs. lxml is a good choice for that.
这篇关于ParseError:在 Python 中解析 XML 文件时未定义实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!