Python用尽了内存,使用cElementTree.iterparse解析XML [英] Python running out of memory parsing XML using cElementTree.iterparse
问题描述
我的XML解析功能的简化版本在这里:
A simplified version of my XML parsing function is here:
import xml.etree.cElementTree as ET
def analyze(xml):
it = ET.iterparse(file(xml))
count = 0
for (ev, el) in it:
count += 1
print('count: {0}'.format(count))
这会导致Python的内存不足,这没有任何意义.我真正存储的唯一内容是计数,一个整数.为什么这样做:
This causes Python to run out of memory, which doesn't make a whole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this:
看到内存和CPU使用率突然下降了吗?那是Python的惊人崩溃.至少它给了我MemoryError
(取决于我在循环中所做的事情,它给了我更多的随机错误,例如IndexError
)和堆栈跟踪而不是段错误.但是为什么会崩溃?
See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me a MemoryError
(depending on what else I am doing in the loop, it gives me more random errors, like an IndexError
) and a stack trace instead of a segfault. But why is it crashing?
推荐答案
文档确实告诉您将XML节逐步解析为元素树 [我的重点]",但并未涵盖如何避免保留不感兴趣的元素(可能都是这些). 这篇文章由effbot覆盖.
The documentation does tell you "Parses an XML section into an element tree [my emphasis] incrementally" but doesn't cover how to avoid retaining uninteresting elements (which may be all of them). That is covered by this article by the effbot.
我强烈建议使用.iterparse()
的任何人都应阅读本文作者:丽莎·戴利(Liza Daly).它涵盖了lxml
和[c] ElementTree.
I strongly recommend that anybody using .iterparse()
should read this article by Liza Daly. It covers both lxml
and [c]ElementTree.
以前关于SO的报道:
对大型XML文件使用Python Iterparse
Python xml ElementTree可以解析很大的xml文件吗? /a>
解析大型文本的最快方法是什么Python中的XML文档?
Using Python Iterparse For Large XML Files
Can Python xml ElementTree parse a very large xml file?
What is the fastest way to parse large XML docs in Python?
这篇关于Python用尽了内存,使用cElementTree.iterparse解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!