迭代db结果时,如何在应用程序引擎(python)中收集内存垃圾 [英] How is memory garbage collected in app engine (python) when iterating over db results

查看:101
本文介绍了迭代db结果时,如何在应用程序引擎(python)中收集内存垃圾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在应用程序引擎上,我得到了超过软私人内存限制错误,并确实检查 memory_usage()。current()确认问题。请参阅下面的日志语句输出。似乎每次获取一批foos时,内存都会增加。

我的问题是:为什么内存不被垃圾收集?我期望,在循环的每次迭代中(分别为 while和/或c $ c>循环,循环)名称 foos foo 的重用会导致 foos用于指向的 foo 会被'取消引用'(即变为不可访问),因此有资格进行垃圾回收,然后进行垃圾回收因为记忆变得紧张。但显然它没有发生。

  from google.appengine.api.runtime import memory_usage 

batch_size = 10
dict_of_results = {}
results = 0
cursor = None

while:
foos = models.Foo.all()。filter('status =',6 )
如果光标:
foos.with_cursor(游标)
$ b $ for foo在foos.run(batch_size = batch_size):

logging.debug( 'on result#{} used memory of {}'。format(results,memory_usage()。current()))
results + = 1

ar = some_module.get_bar(foo)

if bar:
try:
dict_of_results [bar.baz] + = 1
除了KeyError:
dict_of_results [bar.baz] = 1


如果结果> = batch_size:
cursor = foos.cursor()
break

else:
break

和some_module.py

  def get_bar(foo):

for foo.bars中的bar:
if ba r.status == 10:
返回栏

返回无

对结果#1输出logging.debug(缩写)

 在结果#2上使用了43 
的内存使用内存43
.....
对结果#20使用内存43
对结果#21使用内存49
.....
在结果#32上使用内存49
对结果#33使用内存54
.....
对结果#44使用内存54
对结果#45使用内存59
.....
对结果#55使用内存59
.....
.....
.... 。

结果#597使用284.3
的内存超过256 MB的软内存限制,313 MB后超过1个请求总数

解决方案

看起来你的批处理解决方案与db的批处理冲突,导致大量额外的批处理。当你运行 query.run(batch_size = batch_size)时,db会运行查询u完成整个限制。当你到达批处理结束时,db将抓取下一批。然而,在db完成之后,你退出循环并重新开始。这意味着批次1 - > n将全部在内存中存在两次。一次为最后一次查询获取,一次为您的下一次查询获取。



如果您想循环所有实体,只需让db处理批处理:



$ $ p $ foos = models.Foo.all()。filter('status =',6)
for foo in foos.run batch_size = batch_size):
results + = 1
bar = some_module.get_bar(foo)
if bar:
try:
dict_of_results [bar.baz] + = 1
除KeyError:
dict_of_results [bar.baz] = 1

或,如果你想自己处理批处理,请确保db不会执行任何批处理:

  while True:
如果使用光标:
foo_query.with_cursor(游标)
foos = foo_query.fetch(limit = batch_size)foo_query = models.Foo.all()。filter('status =',6)
如果不是foos:
break

cursor = foos.cursor()


I have some code that iterates over DB entities, and runs in a task - see below.

On app engine I'm getting Exceeded soft private memory limit error, and indeed checking memory_usage().current() confirms the problem. See below for output from logging statement. It seems that every time a batch of foos is fetched the memory goes up.

My question is: why is the memory not being garbage collected? I would expect, that in each iteration of of the loops (the while loop, and the for loop, respectively) the re-use of the name foos and the foo would cause the objects to which foos and foo used to point would be 'de-referenced' (i.e. become inaccessible) and therefore become eligible for garbage collection, and then be garbage collected as memory gets tight. But evidently that it not happening.

from google.appengine.api.runtime import memory_usage

batch_size = 10
dict_of_results = {}
results = 0
cursor = None

while True:
  foos = models.Foo.all().filter('status =', 6)
  if cursor:
     foos.with_cursor(cursor)

  for foo in foos.run(batch_size = batch_size):

     logging.debug('on result #{} used memory of {}'.format(results, memory_usage().current()))
     results +=1

     bar  = some_module.get_bar(foo)

     if bar:
        try:
           dict_of_results[bar.baz] += 1
        except KeyError:
           dict_of_results[bar.baz] = 1


     if results >= batch_size:
        cursor = foos.cursor()
        break

  else:
     break   

and in some_module.py

def get_bar(foo):

  for bar in foo.bars:
    if bar.status == 10:
       return bar

  return None  

Output of logging.debug (shortened)

on result #1 used memory of 43
on result #2 used memory of 43
.....
on result #20 used memory of 43
on result #21 used memory of 49
.....
on result #32 used memory of 49
on result #33 used memory of 54
.....
on result #44 used memory of 54
on result #45 used memory of 59
.....
on result #55 used memory of 59
.....
.....
.....

on result #597 used memory of 284.3
Exceeded soft private memory limit of 256 MB with 313 MB after servicing 1 requests total

解决方案

It looks like your batch solution is conflicting with db's batching, resulting in a lot of extra batches hanging around.

When you run query.run(batch_size=batch_size), db will run the query until completion of the entire limit. When you reach the end of the batch, db will grab the next batch. However, right after db does this, you exit the loop and start again. What this means is that batches 1 -> n will all exist in memory twice. Once for the last queries fetch, once for your next queries fetch.

If you want to loop over all your entities, just let db handle the batching:

foos = models.Foo.all().filter('status =', 6)
for foo in foos.run(batch_size = batch_size):
  results +=1
  bar  = some_module.get_bar(foo)
  if bar:
    try:
      dict_of_results[bar.baz] += 1
    except KeyError:
      dict_of_results[bar.baz] = 1

Or, if you want to handle batching yourself, make sure db doesn't do any batching:

while True:
  foo_query = models.Foo.all().filter('status =', 6)
  if cursor:
    foo_query.with_cursor(cursor)
  foos = foo_query.fetch(limit=batch_size)
  if not foos:
    break

  cursor = foos.cursor()

这篇关于迭代db结果时,如何在应用程序引擎(python)中收集内存垃圾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆