Python用尽了内存,使用cElementTree.iterparse解析XML [英] Python running out of memory parsing XML using cElementTree.iterparse

查看:109
本文介绍了Python用尽了内存,使用cElementTree.iterparse解析XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的XML解析功能的简化版本在这里:

A simplified version of my XML parsing function is here:

import xml.etree.cElementTree as ET

def analyze(xml):
    it = ET.iterparse(file(xml))
    count = 0

    for (ev, el) in it:
        count += 1

    print('count: {0}'.format(count))

这会导致Python的内存不足,这没有任何意义.我真正存储的唯一内容是计数,一个整数.为什么这样做:

This causes Python to run out of memory, which doesn't make a whole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this:

看到内存和CPU使用率突然下降了吗?那是Python的惊人崩溃.至少它给了我MemoryError(取决于我在循环中所做的事情,它给了我更多的随机错误,例如IndexError)和堆栈跟踪而不是段错误.但是为什么会崩溃?

See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me a MemoryError (depending on what else I am doing in the loop, it gives me more random errors, like an IndexError) and a stack trace instead of a segfault. But why is it crashing?

推荐答案

文档确实告诉您将XML节逐步解析为元素树 [我的重点]",但并未涵盖如何避免保留不感兴趣的元素(可能都是这些). 这篇文章由effbot覆盖.

The documentation does tell you "Parses an XML section into an element tree [my emphasis] incrementally" but doesn't cover how to avoid retaining uninteresting elements (which may be all of them). That is covered by this article by the effbot.

我强烈建议使用.iterparse()的任何人都应阅读本文作者:丽莎·戴利(Liza Daly).它涵盖了lxml和[c] ElementTree.

I strongly recommend that anybody using .iterparse() should read this article by Liza Daly. It covers both lxml and [c]ElementTree.

以前关于SO的报道:

对大型XML文件使用Python Iterparse
Python xml ElementTree可以解析很大的xml文件吗? /a>
解析大型文本的最快方法是什么Python中的XML文档?

Using Python Iterparse For Large XML Files
Can Python xml ElementTree parse a very large xml file?
What is the fastest way to parse large XML docs in Python?

这篇关于Python用尽了内存,使用cElementTree.iterparse解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆