为什么我用此python循环泄漏内存? [英] Why am I leaking memory with this python loop?

查看:114
本文介绍了为什么我用此python循环泄漏内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个自定义文件系统搜寻器,它已通过sys.stdin传递了数百万个glob以进行处理.我发现,运行脚本时,其内存使用量会随着时间的推移而大量增加,并且整个过程实际上都停止了.我在下面写了一个最小的案例来说明问题.我是在做错什么,还是在Python/glob模块中发现了错误? (我正在使用python 2.5.2).

I am writing a custom file system crawler, which gets passed millions of globs to process through sys.stdin. I'm finding that when running the script, its memory usage increases massively over time and the whole thing crawls practically to a halt. I've written a minimal case below which shows the problem. Am I doing something wrong, or have I found a bug in Python / the glob module? (I am using python 2.5.2).


#!/usr/bin/env python
import glob
import sys
import gc

previous_num_objects = 0

for count, line in enumerate(sys.stdin):
   glob_result = glob.glob(line.rstrip('\n'))
   current_num_objects = len(gc.get_objects())
   new_objects = current_num_objects - previous_num_objects

   print "(%d) This: %d, New: %d, Garbage: %d, Collection Counts: %s"\
 % (count, current_num_objects, new_objects, len(gc.garbage), gc.get_count())
   previous_num_objects = current_num_objects

输出如下:


(0) This: 4042, New: 4042, Python Garbage: 0, Python Collection Counts: (660, 5, 0)
(1) This: 4061, New: 19, Python Garbage: 0, Python Collection Counts: (90, 6, 0)
(2) This: 4064, New: 3, Python Garbage: 0, Python Collection Counts: (127, 6, 0)
(3) This: 4067, New: 3, Python Garbage: 0, Python Collection Counts: (130, 6, 0)
(4) This: 4070, New: 3, Python Garbage: 0, Python Collection Counts: (133, 6, 0)
(5) This: 4073, New: 3, Python Garbage: 0, Python Collection Counts: (136, 6, 0)
(6) This: 4076, New: 3, Python Garbage: 0, Python Collection Counts: (139, 6, 0)
(7) This: 4079, New: 3, Python Garbage: 0, Python Collection Counts: (142, 6, 0)
(8) This: 4082, New: 3, Python Garbage: 0, Python Collection Counts: (145, 6, 0)
(9) This: 4085, New: 3, Python Garbage: 0, Python Collection Counts: (148, 6, 0)

每第100次迭代,将释放100个对象,因此len(gc.get_objects()每100次迭代增加200. len(gc.garbage)永远不会从0改变.第二代收集计数缓慢增加,而0和1计数则递增和递减.

Every 100th iteration, 100 objects are freed, so len(gc.get_objects() increases by 200 every 100 iterations. len(gc.garbage) never changes from 0. The 2nd generation collection count increases slowly, while the 0th and 1st counts go up and down.

推荐答案

我跟踪到了fnmatch模块. glob.glob调用fnmatch来实际执行globbing,并且fnmatch具有一个正则表达式缓存,该缓存永远不会清除.因此,在这种用法中,缓存不断增长且不受限制.我已经针对fnmatch库[1]提交了一个错误.

I tracked this down to the fnmatch module. glob.glob calls fnmatch to actually perform the globbing, and fnmatch has a cache of regular expressions which is never cleared. So in this usage, the cache was growing continuously and unchecked. I've filed a bug against the fnmatch library [1].

[1]: http://bugs.python.org/issue7846 Python错误

[1]: http://bugs.python.org/issue7846 Python Bug

这篇关于为什么我用此python循环泄漏内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆