数据库的Python / Django轮询有内存泄漏 [英] Python/Django polling of database has memory leak

查看:175
本文介绍了数据库的Python / Django轮询有内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个运行Django的Python脚本用于数据库和内存缓存,但是它作为独立的守护程序运行(即不响应网络服务器请求)。守护程序会在Django模型的Requisition中检查具有 status = STATUS_NEW 的对象,然后将其标记为STATUS_WORKING并将其放入队列。

I've got a Python script running Django for database and memcache, but it's notably runnning as a standalone daemon (i.e. not responding to webserver requests). The daemon checks a Django model Requisition for objects with a status=STATUS_NEW, then marks them STATUS_WORKING and puts them into a queue.

许多进程(使用多进程程序包创建)会将事物拉出队列,并通过传递给 pr.id 的请购单进行工作队列。我相信内存泄漏可能在以下代码中(但是它可能在队列另一侧的 Worker代码中,尽管这不太可能,因为即使没有请求出现,因为内存大小也在增加,即

A number of processes (created using the multiprocess package) will pull things out of the Queue and do work on the Requisition with the pr.id that was passed to the Queue. I believe the memory leak is probably in the following code (but it could be in the 'Worker' code on the other side of the Queue though this is unlikely because because the memory size is growing even when no Requisitions are coming up -- i.e. when the workers are all blocking on Queue.get()).

from requisitions.models import Requisition # our Django model
from multiprocessing import Queue

while True:
    # Wait for "N"ew requisitions, then pop them into the queue.
    for pr in Requisition.objects.all().filter(status=Requisition.STATUS_NEW):
        pr.set_status(pr.STATUS_WORKING)
        pr.save()
        queue.put(pr.id)

    time.sleep(settings.DAEMON_POLL_WAIT)

settings.DAEMON_POLL_WAIT = 0.01 的地方。

似乎我将其运行一段时间(即几天),Python进程将增长到无限大小,最终系统将耗尽内存。

It seems if I leave this running for a period of time (i.e. a couple days) the Python process will grow to infinite size and eventually the system will run out of memory.

这里发生了什么(或我怎么知道) ,更重要的是-您如何运行执行此操作的守护进程?

What's going on here (or how can I find out), and more importantly - how can you run a daemon that does this?

我的第一个想法是更改功能的动态,特别是通过检查新的Requisition对象放入 django.core.cache缓存,即

My first thought is to change the dynamic of the function, notably by putting the check for new Requisition objects into a django.core.cache cache, i.e.

from django.core.cache import cache

while True:
    time.sleep(settings.DAEMON_POLL_WAIT)
    if cache.get('new_requisitions'):
       # Possible race condition
       cache.clear()
       process_new_requisitions(queue)

 def process_new_requisitions(queue):
    for pr in Requisition.objects.all().filter(status=Requisition.STATUS_NEW):
        pr.set_status(pr.STATUS_WORKING)
        pr.save()
        queue.put(pr.id)

使用 status = STATUS_NEW 创建请购单的过程可以执行 cache.set('new_requisitions',1)(或者我们可以在创建新的请购单时捕获信号或Requisition.save()事件,然后在

The process that's creating Requisitions with status=STATUS_NEW can do a cache.set('new_requisitions', 1) (or alternatively we could catch a signal or Requisition.save() event where a new Requisition is being created and then set the flag in the cache from there).

但是我不确定我在这里提出的解决方案是否解决了内存问题(可能与垃圾回收有关-因此范围界定通过 process_new_requisitions 可以解决问题)。

However I'm not sure that the solution I've proposed here addresses the memory issues (which are probably related to garbage collection - so the scoping by way of the process_new_requisitions may solve the problem).

我很感谢任何想法和反馈。 / p>

I'm grateful for any thoughts and feedback.

推荐答案

您需要定期重置Django保留用于调试目的的查询列表。通常,它会在每次请求后清除,但是由于您的应用程序不是基于请求的,因此您需要手动执行此操作:

You need to regularly reset a list of queries that Django keeps for debugging purposes. Normally it is cleared after every request, but since your application is not request based, you need to do this manually:

from django import db

db.reset_queries()






另请参见:


See also:

  • "Debugging Django memory leak with TrackRefs and Guppy" by Mikko Ohtamaa:


Django为
调试目的
(connection.queries)跟踪所有查询。此列表已在HTTP请求结束时重置

但是在独立模式下,没有
请求。因此,您需要在每个
工作周期之后手动将
重置为查询列表

Django keeps track of all queries for debugging purposes (connection.queries). This list is reseted at the end of HTTP request. But in standalone mode, there are no requests. So you need to manually reset to queries list after each working cycle


  • 为什么Django泄漏内存?在Django FAQ中-讨论了
    关于将 DEBUG 设置为 False 的问题重要,而
    关于使用 db.reset_queries()清除查询列表,
    在像您这样的应用程序中很重要。

  • "Why is Django leaking memory?" in Django FAQ - it talks both about setting DEBUG to False, which is always important, and about clearing the list of queries using db.reset_queries(), important in applications like yours.

    这篇关于数据库的Python / Django轮询有内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆