Python 内存不足,使用 cElementTree.iterparse 解析 XML [英] Python running out of memory parsing XML using cElementTree.iterparse

查看:35
本文介绍了Python 内存不足,使用 cElementTree.iterparse 解析 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 XML 解析函数的简化版本在这里:

A simplified version of my XML parsing function is here:

import xml.etree.cElementTree as ET

def analyze(xml):
    it = ET.iterparse(file(xml))
    count = 0

    for (ev, el) in it:
        count += 1

    print('count: {0}'.format(count))

这会导致 Python 内存不足,这没有多大意义.我实际存储的唯一内容是计数,一个整数.为什么要这样做:

This causes Python to run out of memory, which doesn't make a whole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this:

看到最后内存和CPU使用率突然下降了吗?那是 Python 的崩溃.至少它给了我一个 MemoryError(取决于我在循环中做的其他事情,它给了我更多的随机错误,比如一个 IndexError)和一个堆栈跟踪而不是段错误.但是为什么会崩溃?

See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me a MemoryError (depending on what else I am doing in the loop, it gives me more random errors, like an IndexError) and a stack trace instead of a segfault. But why is it crashing?

推荐答案

代码示例:

import xml.etree.cElementTree as etree

def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory

这篇关于Python 内存不足,使用 cElementTree.iterparse 解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆