Python json内存膨胀 [英] Python json memory bloat
问题描述
导入json
导入时间$ b $ from itertools导入次数
def keygen(大小):
for i in count (1):
s = str(i)
yield'0'*(size - len(s))+ str(s)
def jsontest(num):
key = keygen(20)
kvjson = json.dumps(dict((keys.next(),'0'* 200)for i in range(num)))
kvpairs = json .loads(kvjson)
del kvpairs#不需要。只是为了检查它是否有任何区别
print'load completed'
jsontest(500000)
while 1:
time.sleep(1)
Linux top 表示python进程在完成后可以保存〜450Mb的RAM 'jsontest'函数。如果省略了对 json.loads 的调用,则不会出现此问题。 gc.collect 在此函数执行后不会释放内存。
看起来内存不在任何缓存或python的内部内存分配器显式调用gc.collect都会释放内存。
发生这种情况是因为垃圾收集的阈值(700,10, 10)从来没有达到过?
我在 jsontest 之后放置了一些代码来模拟阈值。但它并没有帮助。
把这个放在程序的顶部
import gc
gc.set_debug(gc.DEBUG_STATS)
,只要有一个集合,你就会得到打印输出。你会发现在你的示例代码中,在 jsontest
完成之后没有收集,直到程序退出。
您可以将
print gc.get_count()
查看当前计数。第一个数字是自第0代最后一次收集以来分配超过释放的数量;第二个(第三个)是自第一代(第2代)的最后一次收集以来收集的第0代(或第1代)的次数。如果你在 jsontest
完成后立即打印这些数据,你会发现计数是(548,6,0)
或类似的东西(毫无疑问,这取决于Python版本)。所以没有达到阈值,也没有进行收集。
这是基于阈值的垃圾收集调度的典型行为。如果您需要及时将可用内存返回到操作系统,那么您需要将基于阈值的调度与基于时间的调度结合起来(也就是说,在上次收集后经过一段时间后再请求另一个收集,即使尚未达到阈值)。
import json
import time
from itertools import count
def keygen(size):
for i in count(1):
s = str(i)
yield '0' * (size - len(s)) + str(s)
def jsontest(num):
keys = keygen(20)
kvjson = json.dumps(dict((keys.next(), '0' * 200) for i in range(num)))
kvpairs = json.loads(kvjson)
del kvpairs # Not required. Just to check if it makes any difference
print 'load completed'
jsontest(500000)
while 1:
time.sleep(1)
Linux top indicates that the python process holds ~450Mb of RAM after completion of 'jsontest' function. If the call to 'json.loads' is omitted then this issue is not observed. A gc.collect after this function execution does releases the memory.
Looks like the memory is not held in any caches or python's internal memory allocator as explicit call to gc.collect is releasing memory.
Is this happening because the threshold for garbage collection (700, 10, 10) was never reached ?
I did put some code after jsontest to simulate threshold. But it didn't help.
Put this at the top of your program
import gc
gc.set_debug(gc.DEBUG_STATS)
and you'll get printed output whenever there's a collection. You'll see that in your example code there is no collection after jsontest
completes, until the program exits.
You can put
print gc.get_count()
to see the current counts. The first number is the excess of allocations over deallocations since the last collection of generation 0; the second (resp. third) is the number of times generation 0 (resp. 1) has been collected since the last collection of generation 1 (resp. 2). If you print these immediately after jsontest
completes you'll see that the counts are (548, 6, 0)
or something similar (no doubt this varies according to Python version). So the threshold was not reached and no collection took place.
This is typical behaviour for threshold-based garbage collection scheduling. If you need free memory to be returned to the operating system in a timely manner, then you need to combine threshold-based scheduling with time-based scheduling (that is, request another collection after a certain amount of time has passed since the last collection, even if the threshold has not been reached).
这篇关于Python json内存膨胀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!