ParseError:在 Python 中解析 XML 文件时未定义实体 [英] ParseError: undefined entity while parsing XML file in Python

查看:42
本文介绍了ParseError:在 Python 中解析 XML 文件时未定义实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个 article 节点的大型 XML 文件.我只包含了一个问题.我尝试在 Python 中解析它以过滤一些数据,但出现错误

I have a big XML file with several article nodes. I have included only one with the problem. I try to parse it in Python to filter some data and I get the error

File "<string>", line unknown
ParseError: undefined entity &Ouml;: line 90, column 17

XML 文件示例

<?xml version="1.0" encoding="ISO-8859-1"?>
    <!DOCTYPE dblp SYSTEM "dblp.dtd">
    <dblp>
        <article mdate="2019-10-25" key="tr/gte/TR-0146-06-91-165" publtype="informal">
            <author>Alejandro P. Buchmann</author>
            <author>M. Tamer &Ouml;zsu</author>
            <author>Dimitrios Georgakopoulos</author>
            <title>Towards a Transaction Management System for DOM.</title>
            <journal>GTE Laboratories Incorporated</journal>
            <volume>TR-0146-06-91-165</volume>
            <month>June</month>
            <year>1991</year>
            <url>db/journals/gtelab/index.html#TR-0146-06-91-165</url>
        </article>
    </dblp>

从我在谷歌搜索,我发现如果节点名称有问题,就会出现这种错误.但是,出现错误的行是文本中的第二个 author.

From my search in Google, I found that this kind of error appears if you have issues in the node names. However, the line with the error is the second author, in the text.

这是我的 Python 代码

This is my Python code

with open('xaa.xml', 'r') as xml_file:
    xml_tree = etree.parse(xml_file)

推荐答案

Ouml 实体的声明大概在 DTD (dblp.dtd) 中,但 ElementTree 不支持外部 DTD.ElementTree 仅识别直接在 XML 文件中(在内部子集"中)声明的实体.这是一个工作示例:

The declaration of the Ouml entity is presumably in the DTD (dblp.dtd), but ElementTree does not support external DTDs. ElementTree only recognizes entities declared directly in the XML file (in the "internal subset"). This is a working example:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp [
<!ENTITY Ouml 'Ö'>
]>
<dblp>
  <article mdate="2019-10-25" key="tr/gte/TR-0146-06-91-165" publtype="informal">
    <author>Alejandro P. Buchmann</author>
    <author>M. Tamer &Ouml;zsu</author>
    <author>Dimitrios Georgakopoulos</author>
    <title>Towards a Transaction Management System for DOM.</title>
    <journal>GTE Laboratories Incorporated</journal>
    <volume>TR-0146-06-91-165</volume>
    <month>June</month>
    <year>1991</year>
    <url>db/journals/gtelab/index.html#TR-0146-06-91-165</url>
  </article>
</dblp>

要解析问题中的 XML 文件而不会出错,您需要一个支持外部 DTD 的更强大的 XML 库.lxml 是一个不错的选择.

To parse the XML file in the question without errors, you need a more powerful XML library that supports external DTDs. lxml is a good choice for that.

这篇关于ParseError:在 Python 中解析 XML 文件时未定义实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆