在Google App Engine中,如何在向blobstore写入文件时减少内存消耗,而不是超出软内存限制? [英] In Google App Engine, how to I reduce memory consumption as I write a file out to the blobstore rather than exceed the soft memory limit?

查看:117
本文介绍了在Google App Engine中,如何在向blobstore写入文件时减少内存消耗,而不是超出软内存限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用blobstore以csv格式备份和恢复实体。这个过程对我所有的小型车都很有效。但是,一旦我开始使用超过2K个实体的模型,我已经超出了软内存限制。我一次只获取50个实体,然后将结果写入blobstore,所以我不清楚为什么我的内存使用量会增加。我可以通过增加下面传递的限制值来可靠地使方法失败,这会导致方法运行的时间稍长一些,以导出更多的实体。


  1. 有关如何优化此过程以减少内存消耗的建议?

  2. 另外,生成的文件大小只有<500KB。为什么这个过程会使用140 MB的内存?

简单示例:

  with files.open(file_name,'a')as f:
writer = csv.DictWriter f,fieldnames = properties)
for models.Player.all():
row = backup.get_dict_for_entity(entity)
writer.writerow(row)

产生错误:
服务7个请求总数后超过150.957 MB的软私有内存限制



简单示例2:

这个问题似乎与在python 2.5中使用文件和with语句有关。分解csv的东西,我可以通过简单地将4000行文本文件写入blobstore来重现几乎相同的错误。

 来自__future__ import with_statement $ b $来自google.appengine.api从google.appengine.ext导入文件
.blobstore import blobstore
file_name = files.blobstore.create(mime_type ='application / octet-stream')
myBuffer = StringIO.StringIO()

#Put 4000行myBuffer中的文本
$ b with files.open(file_name,'a')as f:
for myBuffer.getvalue()中的行。splitlies():
f.write行)

files.finalize(file_name)
blob_key = files.blobstore.get_blob_key(file_name)

产生错误:
服务24个请求后超过软私人内存限制154.977 MB总数



原文:

>

  def backup_model_to_blobstore(model,limit = None,batch_size = None):
file_name = files.blobstore.create(mime_type =' application / octet-stream')
#打开文件并使用files.open(file_name,'a ')as f:
#获取csv文件的字段名。
query = model.all()。fetch(1)
entity = query [0]
properties = entity .__ class __。properties()
#将ID添加为属性
属性['ID'] = entity.key().id()

#调试而不是尝试并捕获
如果为真:
writer = csv。 DictWriter(f,fieldnames = properties)
#写出一个标题行
headers = dict((n,n)for n in properties)
writer.writerow(headers)

numBatches = int(limit / batch_size)
如果numBatches == 0:
numBatches = 1

在范围内(numBatches):
记录.info(**************查询偏移%s和限制%s,x * batch_size,batch_size)
query = model.all()。fetch(limit = batch_size,offset = x * batch_size)
用于查询中的实体:
#这只是返回一个包含键值对的小字典
row = get_dict_for_entity(entit y)
#为每个实体写出一行。
writer.writerow(row)

#结束文件。在尝试阅读之前先做这件事。
files.finalize(file_name)

blob_key = files.blobstore.get_blob_key(file_name)
返回blob_key

错误在日志中看起来像这样

  ..... 。
2012-02-02 21:59:19.063
**************查询偏移2050并限制50
I 2012-02-02 21:59:20.076
**************以偏移量2100查询并限制50
I 2012-02-02 21:59:20.781
* *************查询偏移量2150并限制50
I 2012-02-02 21:59:21.508
例外:Chris(202.161.57.1​​67)

err:
Traceback(最近一次调用最后一次):
.....
blob_key = backup_model_to_blobstore(model,limit = limit,batch_size = batch_size)
文件/base/data/home/apps/singpath/163.356548765202135434/singpath/backup.py,第125行,在backup_model_to_blobstore
writer.writerow(row)
文件/ base / python_runtime / python_lib /var//1/google/appengine/api/files/file.py,第281行,在__exit __
self.close()
文件/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py,第275行,关闭
self ._make_rpc_call_with_retry('Close',request,response)
在_make_rpc_call_with_retry
中的文件/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py,第388行_make_call(方法,请求,响应)
文件/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py,行236,位于_make_call
_raise_app_error(e )
文件/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py,第179行,位于_raise_app_error
中FileNotOpenedError()
FileNotOpenedError

C 2012-02-02 21:59:23.009
服务14个请求后,超过软私人内存限制149.426 MB


解决方案

你最好不要自己做批处理,而只是迭代查询。迭代器将选择一个应该足够的批量大小(大概20):

  q = model.all()
为实体q:
row = get_dict_for_entity(entity)
writer.writerow(row)

这样可以避免重复运行查询,而且偏移量不断增加,这很慢并且会导致数据存储区中出现二次方行为。

一个经常忽略的事实内存使用情况是实体的内存中表示可以使用与实体的序列化形式相比内存的30-50倍;例如磁盘上3KB的实体可能会在RAM中使用100KB。 (确切的放大因素取决于许多因素;如果您有很多名称较长且值较小的属性,更糟糕的情况会更糟,因为长名称的重复属性会更糟糕。)

I'm using the blobstore to backup and recovery entities in csv format. The process is working well for all of my smaller models. However, once I start to work on models with more than 2K entities, I am exceeded the soft memory limit. I'm only fetching 50 entities at a time and then writing the results out to the blobstore, so I'm not clear why my memory usage would be building up. I can reliably make the method fail just by increasing the "limit" value passed in below which results in the method running just a little longer to export a few more entities.

  1. Any recommendations on how to optimize this process to reduce memory consumption?

  2. Also, the files produced will only <500KB in size. Why would the process use 140 MB of memory?

Simplified example:

file_name = files.blobstore.create(mime_type='application/octet-stream')
with files.open(file_name, 'a') as f:
    writer = csv.DictWriter(f, fieldnames=properties)
    for entity in models.Player.all():
      row = backup.get_dict_for_entity(entity)
      writer.writerow(row)

Produces the error: Exceeded soft private memory limit with 150.957 MB after servicing 7 requests total

Simplified example 2:

The problem seems to be with using files and the with statement in python 2.5. Factoring out the csv stuff, I can reproduce almost the same error by simply trying to write a 4000 line text file to the blobstore.

from __future__ import with_statement
from google.appengine.api import files
from google.appengine.ext.blobstore import blobstore
file_name = files.blobstore.create(mime_type='application/octet-stream')   
myBuffer = StringIO.StringIO()

#Put 4000 lines of text in myBuffer

with files.open(file_name, 'a') as f:
    for line in myBuffer.getvalue().splitlies():
        f.write(line)

files.finalize(file_name)  
blob_key = files.blobstore.get_blob_key(file_name)

Produces the error: Exceeded soft private memory limit with 154.977 MB after servicing 24 requests total

Original:

def backup_model_to_blobstore(model, limit=None, batch_size=None):
    file_name = files.blobstore.create(mime_type='application/octet-stream')
    # Open the file and write to it
    with files.open(file_name, 'a') as f:
      #Get the fieldnames for the csv file.
      query = model.all().fetch(1)
      entity = query[0]
      properties = entity.__class__.properties()
      #Add ID as a property
      properties['ID'] = entity.key().id()

      #For debugging rather than try and catch
      if True:
        writer = csv.DictWriter(f, fieldnames=properties)
        #Write out a header row
        headers = dict( (n,n) for n in properties )
        writer.writerow(headers)

        numBatches = int(limit/batch_size)
        if numBatches == 0:
            numBatches = 1

        for x in range(numBatches):
          logging.info("************** querying with offset %s and limit %s", x*batch_size, batch_size)
          query = model.all().fetch(limit=batch_size, offset=x*batch_size)
          for entity in query:
            #This just returns a small dictionary with the key-value pairs
            row = get_dict_for_entity(entity)
            #write out a row for each entity.
            writer.writerow(row)

    # Finalize the file. Do this before attempting to read it.
    files.finalize(file_name)

    blob_key = files.blobstore.get_blob_key(file_name)
    return blob_key

The error looks like this in the logs

......
2012-02-02 21:59:19.063
************** querying with offset 2050 and limit 50
I 2012-02-02 21:59:20.076
************** querying with offset 2100 and limit 50
I 2012-02-02 21:59:20.781
************** querying with offset 2150 and limit 50
I 2012-02-02 21:59:21.508
Exception for: Chris (202.161.57.167)

err:
Traceback (most recent call last):
  .....
    blob_key = backup_model_to_blobstore(model, limit=limit, batch_size=batch_size)
  File "/base/data/home/apps/singpath/163.356548765202135434/singpath/backup.py", line 125, in backup_model_to_blobstore
    writer.writerow(row)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__
    self.close()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 275, in close
    self._make_rpc_call_with_retry('Close', request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
    _raise_app_error(e)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error
    raise FileNotOpenedError()
FileNotOpenedError

C 2012-02-02 21:59:23.009
Exceeded soft private memory limit with 149.426 MB after servicing 14 requests total

解决方案

You'd be better off not doing the batching yourself, but just iterating over the query. The iterator will pick a batch size (probably 20) that should be adequate:

q = model.all()
for entity in q:
    row = get_dict_for_entity(entity)
    writer.writerow(row)

This avoids re-running the query with ever-increasing offset, which is slow and causes quadratic behavior in the datastore.

An oft-overlooked fact about memory usage is that the in-memory representation of an entity can use 30-50 times the RAM compared to the serialized form of the entity; e.g. an entity that is 3KB on disk might use 100KB in RAM. (The exact blow-up factor depends on many factors; it's worse if you have lots of properties with long names and small values, even worse for repeated properties with long names.)

这篇关于在Google App Engine中,如何在向blobstore写入文件时减少内存消耗,而不是超出软内存限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆