Python json内存膨胀 [英] Python json memory bloat

查看:367
本文介绍了Python json内存膨胀的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 导入json 
导入时间$ b $ from itertools导入次数

def keygen(大小):
for i in count (1):
s = str(i)
yield'0'*(size - len(s))+ str(s)

def jsontest(num):
key = keygen(20)
kvjson = json.dumps(dict((keys.next(),'0'* 200)for i in range(num)))
kvpairs = json .loads(kvjson)
del kvpairs#不需要。只是为了检查它是否有任何区别
print'load completed'

jsontest(500000)

while 1:
time.sleep(1)

Linux top 表示python进程在完成后可以保存〜450Mb的RAM 'jsontest'函数。如果省略了对 json.loads 的调用,则不会出现此问题。 gc.collect 在此函数执行后不会释放内存



看起来内存不在任何缓存或python的内部内存分配器显式调用gc.collect都会释放内存。

发生这种情况是因为垃圾收集的阈值(700,10, 10)从来没有达到过?



我在 jsontest 之后放置了一些代码来模拟阈值。但它并没有帮助。

解决方案

把这个放在程序的顶部

  import gc 
gc.set_debug(gc.DEBUG_STATS)

,只要有一个集合,你就会得到打印输出。你会发现在你的示例代码中,在 jsontest 完成之后没有收集,直到程序退出。



您可以将

  print gc.get_count()

查看当前计数。第一个数字是自第0代最后一次收集以来分配超过释放的数量;第二个(第三个)是自第一代(第2代)的最后一次收集以来收集的第0代(或第1代)的次数。如果你在 jsontest 完成后立即打印这些数据,你会发现计数是(548,6,0)或类似的东西(毫无疑问,这取决于Python版本)。所以没有达到阈值,也没有进行收集。

这是基于阈值的垃圾收集调度的典型行为。如果您需要及时将可用内存返回到操作系统,那么您需要将基于阈值的调度与基于时间的调度结合起来(也就是说,在上次收集后经过一段时间后再请求另一个收集,即使尚未达到阈值)。


import json
import time
from itertools import count

def keygen(size):
    for i in count(1):
        s = str(i)
        yield '0' * (size - len(s)) + str(s)

def jsontest(num):
    keys = keygen(20)
    kvjson = json.dumps(dict((keys.next(), '0' * 200) for i in range(num)))
    kvpairs = json.loads(kvjson)
    del kvpairs # Not required. Just to check if it makes any difference                            
    print 'load completed'

jsontest(500000)

while 1:
    time.sleep(1)

Linux top indicates that the python process holds ~450Mb of RAM after completion of 'jsontest' function. If the call to 'json.loads' is omitted then this issue is not observed. A gc.collect after this function execution does releases the memory.

Looks like the memory is not held in any caches or python's internal memory allocator as explicit call to gc.collect is releasing memory.

Is this happening because the threshold for garbage collection (700, 10, 10) was never reached ?

I did put some code after jsontest to simulate threshold. But it didn't help.

解决方案

Put this at the top of your program

import gc
gc.set_debug(gc.DEBUG_STATS)

and you'll get printed output whenever there's a collection. You'll see that in your example code there is no collection after jsontest completes, until the program exits.

You can put

print gc.get_count()

to see the current counts. The first number is the excess of allocations over deallocations since the last collection of generation 0; the second (resp. third) is the number of times generation 0 (resp. 1) has been collected since the last collection of generation 1 (resp. 2). If you print these immediately after jsontest completes you'll see that the counts are (548, 6, 0) or something similar (no doubt this varies according to Python version). So the threshold was not reached and no collection took place.

This is typical behaviour for threshold-based garbage collection scheduling. If you need free memory to be returned to the operating system in a timely manner, then you need to combine threshold-based scheduling with time-based scheduling (that is, request another collection after a certain amount of time has passed since the last collection, even if the threshold has not been reached).

这篇关于Python json内存膨胀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆