限制在* Large * Django QuerySet中使用内存 [英] Limiting Memory Use in a *Large* Django QuerySet

查看:113
本文介绍了限制在* Large * Django QuerySet中使用内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个任务需要在数据库中的大多数对象上运行,每隔一段时间(每天一次,每周一次)。基本上这意味着我有一些查询看起来像是在自己的线程中运行。

  for Model_instance在SomeModel.objects.all ():
do_something(model_instance)

(注意它实际上是一个过滤器所有(),但不限于此我仍然最终选择一个非常大的对象集。)



我正在运行的问题在运行一段时间后,线程被我的托管服务提供商杀死,因为我使用的内存太多。我假设所有这些内存使用正在发生,因为即使我的查询返回的 QuerySet 对象的内存空间非常小,随着 QuerySet 对象的增长,每个 model_instance 缓存。



我的问题是,以有效的方式,我的数据库中几乎每个 SomeModel 的最佳方法是什么?或者我的问题是我如何从django查询器中取消缓存模型实例?



编辑:我实际使用的结果是查询系统构建一系列新对象。因此,我根本不会更新被查询的对象。

解决方案

实际上最终做的是构建一些可以包装QuerySet的东西,它通过使用切片语法来创建QuerySet的深层拷贝,例如 some_queryset [15:45] - 但是当切片已经被完全迭代时,它再次创建了原始QuerySet的另一个深度。这意味着只有在这个特定切片中返回的对象集存储在内存中。

  class MemorySavingQuerysetIterator(object) 

def __init __(self,queryset,max_obj_num = 1000):
self._base_queryset = queryset
self._generator = self._setup()
self.max_obj_num = max_obj_num

def _setup(self):
for x in xrange(0,self._base_queryset.count(),self.max_obj_num):
#通过制作一个查询器并使用它实际访问
#对象,我们确保在任何给定时间
#内存中只有max_obj_num对象
smaller_queryset = copy.deepcopy(self._base_queryset) [i:i + self.max_obj_num]
logger.debug('从DB获取下一个%s对象'%self.max_obj_num)
在small_queryset.iterator()中的obj:
yield obj

def __iter __(self) :
return self

def next(self):
return self._generator.next()

所以而不是...

  for SomeObject.objects.filter (foo ='bar'):<  - 返回的东西*很多*对象
do_something(obj);

你会做...



<$ p在MemorySavingQuerysetIterator(在SomeObject.objects.filter(foo ='bar'))中的obj的$ p>
do_something(obj);

请注意,这样做的目的是将保存内存 em> Python解释器。它实质上是通过使更多数据库查询来实现的。通常人们正在尝试与之完全相反的方式,即尽可能地减少数据库查询,而不考虑内存使用情况。希望有人会发现这个有用的。


I have a task which needs to be run on 'most' objects in my database once every some period of time (once a day, once a week, whatever). Basically this means that I have some query that looks like this running in it's own thread.

for model_instance in SomeModel.objects.all():
    do_something(model_instance)

(Note that it's actually a filter() not all() but none-the-less I still end up selecting a very large set of objects.)

The problem I'm running into is that after running for a while the thread is killed by my hosting provider because I'm using too much memory. I'm assuming all this memory use is happening because even though the QuerySet object returned by my query initially has a very small memory footprint it ends up growing as the QuerySet object caches each model_instance as I iterate through them.

My question is, "what is the best way to iterate through almost every SomeModel in my database in a memory efficient way?" or perhaps my question is "how do I 'un-cache' model instances from a django queryset?"

EDIT: I'm actually using the results of the queryset to build a series of new objects. As such, I don't end up updating the queried-for objects at all.

解决方案

So what I actually ended up doing is building something that you can 'wrap' a QuerySet in. It works by making a deepcopy of the QuerySet, using the slice syntax--e.g., some_queryset[15:45]--but then it makes another deepcopy of the original QuerySet when the slice has been completely iterated through. This means that only the set of Objects returned in 'this' particular slice are stored in memory.

class MemorySavingQuerysetIterator(object):

    def __init__(self,queryset,max_obj_num=1000):
        self._base_queryset = queryset
        self._generator = self._setup()
        self.max_obj_num = max_obj_num

    def _setup(self):
        for i in xrange(0,self._base_queryset.count(),self.max_obj_num):
            # By making a copy of of the queryset and using that to actually access
            # the objects we ensure that there are only `max_obj_num` objects in
            # memory at any given time
            smaller_queryset = copy.deepcopy(self._base_queryset)[i:i+self.max_obj_num]
            logger.debug('Grabbing next %s objects from DB' % self.max_obj_num)
            for obj in smaller_queryset.iterator():
                yield obj

    def __iter__(self):
        return self

    def next(self):
        return self._generator.next()

So instead of...

for obj in SomeObject.objects.filter(foo='bar'): <-- Something that returns *a lot* of Objects
    do_something(obj);

You would do...

for obj in MemorySavingQuerysetIterator(in SomeObject.objects.filter(foo='bar')):
    do_something(obj);

Please note that the intention of this is to save memory in your Python interpreter. It essentially does this by making more database queries. Usually people are trying to do the exact opposite of that--i.e., minimize database queries as much as possible without regards to memory usage. Hopefully somebody will find this useful though.

这篇关于限制在* Large * Django QuerySet中使用内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆