使用python lxml.etree处理巨大的XML文件 [英] Using python lxml.etree for huge XML files

查看:491
本文介绍了使用python lxml.etree处理巨大的XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Python中使用lxml.etree解析巨大的xml(> 200MB).我尝试使用etree.parse加载XML文件,但是由于文件大小,该操作不起作用:

I would like to parse a huge xml (>200MB) using lxml.etree in Python. I tried to use etree.parse to load the XML file, but this does not work due to the filesize:

etree.parse('file.xml')Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958)
  File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797)
  File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080)
  File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175)
  File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173)
  File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)
  File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)
  File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521)
lxml.etree.XMLSyntaxError: Excessive depth in document: 256 use XML_PARSE_HUGE option, line 1276, column 7

由于我想使用xpath表达式,因此必须首先解析该文件.因此,我如何解析XML文件?如何在lxml.etree连接中使用XML_PARSE_HUGE?

As I want to use xpath expressions, I have to parse the file first. How can I therefore parse the XML file? How do I use XML_PARSE_HUGE in connection to lxml.etree?

谢谢!

推荐答案

尝试创建自定义XMLParser实例:

from lxml.etree import XMLParser, parse
p = XMLParser(huge_tree=True)
tree = parse('file.xml', parser=p)

这篇关于使用python lxml.etree处理巨大的XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆