Django多处理和数据库连接 [英] Django multiprocessing and database connections

查看:146
本文介绍了Django多处理和数据库连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:



我正在使用一个使用Django和Postgres数据库的项目。我们也使用mod_wsgi来解决问题,因为我的一些网页搜索已经提到了。在Web表单提交中,Django视图会启动一个需要大量时间的工作(超过用户想要等待),因此我们通过后台的系统调用来启动该工作。正在运行的作业需要能够读写数据库。因为这个工作需要很长时间,所以我们使用多处理来并行运行它的一部分。



问题:



顶级脚本具有数据库连接,当它生成子进程时,似乎父级的连接对于子级是可用的。那么在查询之前如何调用SET TRANSACTION ISOLATION LEVEL就有一个例外。研究表明,这是由于尝试在多个进程中使用相同的数据库连接。我发现一个线程建议在子进程开始时调用connection.close(),以便Django在需要一个时自动创建一个新连接,因此每个子进程将具有唯一的连接 - 即不共享。这对我来说不起作用,因为在子进程中调用connection.close()导致父进程抱怨连接丢失。



其他结果: / p>

我读过的一些东西似乎表明你不能真正做到这一点,而且多处理,mod_wsgi和Django不能很好地在一起玩。这似乎很难相信我猜。



有些建议使用芹菜,这可能是一个长期的解决方案,但是我不能在这个时候安装芹菜一些批准过程,所以现在不是一个选项。



在SO和其他地方找到有关持久性数据库连接的几个引用,我认为这是一个不同的问题。 >

还发现了对psycopg2.pool和pgpool的引用以及关于保镖的内容。诚然,我并没有理解我正在阅读的大部分内容,但它并没有像我正在寻找的那样跳出来。



目前工作环境:



现在,我已经恢复为连续运行的东西,它的工作原理,但比我想要的要慢。 >

有关如何使用多进程并行运行的任何建议?好像我可以让父母和两个孩子都有独立的数据库连接,事情会好起来的,但我似乎没有得到这样的行为。



谢谢,对不起长度!

解决方案

多处理复制进程之间的连接对象,因为它分叉进程,因此复制所有文件父进程的描述符。话虽如此,连接到SQL服务器只是一个文件,你可以看到它在linux下的/ proc // fd / ....任何打开的文件将被分叉进程之间共享。您可以在这里找到更多信息。



我的解决方案只是在启动进程之前就简单地关闭数据库连接,每个进程将在需要一个(在django 1.4中测试)时重新创建连接:



< pre $ 从django导入db
db.connections.close_all()
def db_worker():
some_paralell_code()
进程(target = db_worker,args =())

Pgbouncer / pgpool与多处理意义上的线程没有连接。这是相当的解决方案,没有关闭每个请求的连接=加速连接到postgres在高负载。



更新:



为了完全删除数据库连接问题,只需将与数据库连接的所有逻辑移动到db_worker - 我想传递QueryDict作为参数...更好的想法只是传递ids列表...见 QueryDict 和values_list('id',flat = True),不要忘记把它列出来! list(QueryDict),然后传递给db_worker。由于我们不复制模型数据库连接。

  def db_worker(models_ids):
obj = PartModelWorkerClass(model_ids )#here你做Model.objects.filter(id__in = model_ids)
obj.run()


model_ids = Model.objects.all()。values_list('id ',flat = True)
model_ids = list(model_ids)#cast to list
process_count = 5
delta =(len(model_ids)/ process_count)+ 1

#做所有db的东西在这里...

#这里你可以关闭db连接
从django import db
db.connections.close_all()

在范围内(0:process_count):
进程(target = db_worker,args =(model_ids [it * delta:(it + 1)* delta]))


Background:

I'm working a project which uses Django with a Postgres database. We're also using mod_wsgi in case that matters, since some of my web searches have made mention of it. On web form submit, the Django view kicks off a job that will take a substantial amount of time (more than the user would want to wait), so we kick off the job via a system call in the background. The job that is now running needs to be able to read and write to the database. Because this job takes so long, we use multiprocessing to run parts of it in parallel.

Problem:

The top level script has a database connection, and when it spawns off child processes, it seems that the parent's connection is available to the children. Then there's an exception about how SET TRANSACTION ISOLATION LEVEL must be called before a query. Research has indicated that this is due to trying to use the same database connection in multiple processes. One thread I found suggested calling connection.close() at the start of the child processes so that Django will automatically create a new connection when it needs one, and therefore each child process will have a unique connection - i.e. not shared. This didn't work for me, as calling connection.close() in the child process caused the parent process to complain that the connection was lost.

Other Findings:

Some stuff I read seemed to indicate you can't really do this, and that multiprocessing, mod_wsgi, and Django don't play well together. That just seems hard to believe I guess.

Some suggested using celery, which might be a long term solution, but I am unable to get celery installed at this time, pending some approval processes, so not an option right now.

Found several references on SO and elsewhere about persistent database connections, which I believe to be a different problem.

Also found references to psycopg2.pool and pgpool and something about bouncer. Admittedly, I didn't understand most of what I was reading on those, but it certainly didn't jump out at me as being what I was looking for.

Current "Work-Around":

For now, I've reverted to just running things serially, and it works, but is slower than I'd like.

Any suggestions as to how I can use multiprocessing to run in parallel? Seems like if I could have the parent and two children all have independent connections to the database, things would be ok, but I can't seem to get that behavior.

Thanks, and sorry for the length!

解决方案

Multiprocessing copies connection objects between processes because it forks processes, and therefore copies all the file descriptors of the parent process. That being said, a connection to the SQL server is just a file, you can see it in linux under /proc//fd/.... any open file will be shared between forked processes. You can find more about forking here.

My solution was just simply close db connection just before launching processes, each process recreate connection itself when it will need one (tested in django 1.4):

from django import db
db.connections.close_all()
def db_worker():      
    some_paralell_code()
Process(target = db_worker,args = ())

Pgbouncer/pgpool is not connected with threads in a meaning of multiprocessing. It's rather solution for not closing connection on each request = speeding up connecting to postgres while under high load.

Update:

To completely remove problems with database connection simply move all logic connected with database to db_worker - I wanted to pass QueryDict as an argument... Better idea is simply pass list of ids... See QueryDict and values_list('id', flat=True), and do not forget to turn it to list! list(QueryDict) before passing to db_worker. Thanks to that we do not copy models database connection.

def db_worker(models_ids):        
    obj = PartModelWorkerClass(model_ids) # here You do Model.objects.filter(id__in = model_ids)
    obj.run()


model_ids = Model.objects.all().values_list('id', flat=True)
model_ids = list(model_ids) # cast to list
process_count = 5
delta = (len(model_ids) / process_count) + 1

# do all the db stuff here ...

# here you can close db connection
from django import db
db.connections.close_all()

for it in range(0:process_count):
    Process(target = db_worker,args = (model_ids[it*delta:(it+1)*delta]))   

这篇关于Django多处理和数据库连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆