Django和并行处理: [英] Django and parallel processing:
问题描述
版本:
- Python 3.5.1
- Django 1.10
- mysqlclient 1.3.10
- mysql 5.7.18-0ubuntu0.16.04.1(Ubuntu)
- Linux Mint 18.1
我有一个很大的Django项目,其中有一个安装脚本,该脚本将一些csv文件中的内容添加到数据库中.偶尔,我需要重置所有内容,然后重新添加这些文件中的所有内容.一旦添加数据,还需要进行一些后期处理.但是,这会花费一些时间,因为文件很长,并且代码中不可避免地存在双循环以及许多数据库查询.
I have a large Django project where there's a setup script that adds a bunch of content to the database from some csv files. Once in a while, I need to reset everything, and re-add everything from these files. The data furthermore requires some post-processing once added. This however takes a while because the files are long and there's some unavoidable double loops in the code as well as many database queries.
在许多情况下,这些任务是独立的,因此应该可以并行运行.我四处寻找并行处理库,并决定使用非常简单的 多重处理 .
In many cases, the tasks are independent, and thus they should be possible to run in parallel. I looked around for parallel processing libraries and decided to use the very simple multiprocessing.
因此,设置非常简单.我们定义了一些要并行运行的函数,然后调用Pool
.简化代码:
Thus, the setup is quite simple. We define some function to run in parallel, and then call Pool
. Simplified code:
def some_func(input):
#code inserting data into Django here
pass
with Pool(4) as p:
p.map(some_func, [1, 2, 3, 4])
但是,运行代码会导致数据库连接错误,例如在此处,此处,此处:
However, running the code results in database connection errors like these reported here, here, here:
_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
似乎不同的线程/内核试图共享一个连接,或者连接可能没有传递给工作线程.
It seems like the different threads/cores are trying to share one connection, or maybe the connection is not passed on to the workers.
如何使并行处理与Django数据库操作一起使用?
How do I get parallel processing to work with Django database actions?
推荐答案
After googling around, I was able to find an old (2009) related question on the Django Google groups:
我最近正在调试类似的问题,并得出了结论 (当然这可能是错误的:)多处理和Django DB 连接在一起的效果不佳.我最终关闭了Django DB 连接在新过程中的第一件事.它将重新创建一个新的 需要一个连接但该连接将没有引用时 父母使用的连接.
Hi, I was recently debugging similar issue and came to a conclusion (which may be wrong of course :) that multiprocessing and Django DB connections don't play well together. I ended up closing Django DB connection first thing in the new process. It'll recreate a new connection when it needs one, but that one will have no references to the connection used by the parent.
因此,我的Process.start()
调用了一个以以下内容开头的函数:
So, my Process.start()
calls a function which starts with:
from django.db import connection
connection.close()
这解决了我的问题.
因此,要解决此问题,请将函数更改为以下形式:
Thus, to solve the issue, change the function to be something like this:
def some_func(input):
#kill old database connection
from django.db import connection
connection.close()
#code inserting data into Django here
pass
然后工作正常.
这篇关于Django和并行处理:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!