在Python中解析大型XML文件时出现内存错误 [英] Getting a memory error when parsing a large XML file in Python
本文介绍了在Python中解析大型XML文件时出现内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的XML文件如下:
<root>
<group from="1", to="100">
<link target="1"/>
...
<link target="100"/>
</group>
...
</root>
我有6000个<group>
元素和5M个<link>
元素.我想拥有一个以元组(from
,to
)作为键和<link>
s'target
属性列表的字典,但是我收到以下代码的内存错误:
I have a 6000 <group>
elements and 5M <link>
elements. I want to have a dictionary with the tuple (from
, to
) as keys and a list of <link>
s' target
attributes, but I get a memory error with following code:
from lxml import etree
from gzip import open as gopen
def extractTargets(fin):
targets = dict()
with gopen(fin) as xml:
context = etree.iterparse(xml, tag="group")
for event, elem in context:
targets[(elem.get("from"), elem.get("to"))] = elem.xpath("link/@target")
elem.clear()
while elem.getprevious() is not None:
del elem.getparent()[0]
del context
推荐答案
尝试以下代码:
import lxml.etree
from gzip import open as gopen
class GroupDictTarget(object):
def __init__(self, d):
self.d = d
def start(self, tag, attrib):
if tag == 'group':
self.group = self.d[attrib['from'], attrib['to']] = []
elif tag == 'link':
self.group.append(attrib['target'])
def close(self):
pass
def extractTargets(fin):
with gopen(fin) as xml:
targets = {}
parser = lxml.etree.XMLParser(target=GroupDictTarget(targets))
lxml.etree.parse(xml, parser)
return targets
xml.parsers.expat
import xml.parsers.expat
from gzip import open as gopen
class GroupDictTarget(object):
# Same as above
def extractTargets(fin):
targets = {}
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = GroupDictTarget(targets).start
with gopen(fin) as f:
p.ParseFile(f)
return targets
xml.sax
import xml.sax
from gzip import open as gopen
class GroupDictTarget(object):
# Same as above
def extractTargets(fin):
targets = {}
handler = xml.sax.handler.ContentHandler()
handler.startElement = GroupDictTarget(targets).start
with gopen(fin) as f:
xml.sax.parse(f, handler)
return targets
这篇关于在Python中解析大型XML文件时出现内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文