使用Python解析大型xml文件-etree.parse错误 [英] parsing large xml file with Python - etree.parse error

查看:498
本文介绍了使用Python解析大型xml文件-etree.parse错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试使用lxml.etree.iterparse函数解析以下Python文件.

Trying to parse the following Python file using the lxml.etree.iterparse function.

"sampleoutput.xml"

"sampleoutput.xml"

<item>
  <title>Item 1</title>
  <desc>Description 1</desc>
</item>
<item>
  <title>Item 2</title>
  <desc>Description 2</desc>
</item>

我尝试了使用Python解析大型XML文件中的代码lxml和Iterparse

在调用etree.iterparse(MYFILE)之前,我做了MYFILE = open("/Users/eric/Desktop/wikipedia_map/sampleoutput.xml","r")

before the etree.iterparse(MYFILE) call I did MYFILE = open("/Users/eric/Desktop/wikipedia_map/sampleoutput.xml","r")

但是它出现了以下错误

Traceback (most recent call last):
  File "/Users/eric/Documents/Programming/Eclipse_Workspace/wikipedia_mapper/testscraper.py", line 6, in <module>
    for event, elem in context :
  File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:98565)
  File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:99086)
  File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74712)
lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 5, column 1

有什么想法吗?谢谢!

推荐答案

问题是,如果XML仅具有一个顶级标签,则XML的格式将不正确.您可以通过将整个文档包装在<items></items>标记中来修复示例.您还需要<desc/>标记以匹配您正在使用的查询(description).

The problem is that XML isn't well-formed if it doesn't have exactly one top-level tag. You can fix your sample by wrapping the entire document in <items></items> tags. You also need the <desc/> tags to match the query that you're using (description).

以下文档使用您现有的代码会产生正确的结果:

The following document produces correct results with your existing code:

<items>
  <item>
    <title>Item 1</title>
    <description>Description 1</description>
  </item>
  <item>
    <title>Item 2</title>
    <description>Description 2</description>
  </item>
</items>

这篇关于使用Python解析大型xml文件-etree.parse错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆