Django和并行处理: [英] Django and parallel processing:

查看:321
本文介绍了Django和并行处理:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

版本:

  • Python 3.5.1
  • Django 1.10
  • mysqlclient 1.3.10
  • mysql 5.7.18-0ubuntu0.16.04.1(Ubuntu)
  • Linux Mint 18.1

我有一个很大的Django项目,其中有一个安装脚本,该脚本将一些csv文件中的内容添加到数据库中.偶尔,我需要重置所有内容,然后重新添加这些文件中的所有内容.一旦添加数据,还需要进行一些后期处理.但是,这会花费一些时间,因为文件很长,并且代码中不可避免地存在双循环以及许多数据库查询.

I have a large Django project where there's a setup script that adds a bunch of content to the database from some csv files. Once in a while, I need to reset everything, and re-add everything from these files. The data furthermore requires some post-processing once added. This however takes a while because the files are long and there's some unavoidable double loops in the code as well as many database queries.

在许多情况下,这些任务是独立的,因此应该可以并行运行.我四处寻找并行处理库,并决定使用非常简单的 多重处理 .

In many cases, the tasks are independent, and thus they should be possible to run in parallel. I looked around for parallel processing libraries and decided to use the very simple multiprocessing.

因此,设置非常简单.我们定义了一些要并行运行的函数,然后调用Pool.简化代码:

Thus, the setup is quite simple. We define some function to run in parallel, and then call Pool. Simplified code:

def some_func(input):
    #code inserting data into Django here
    pass

with Pool(4) as p:
    p.map(some_func, [1, 2, 3, 4])

但是,运行代码会导致数据库连接错误,例如在此处此处此处:

However, running the code results in database connection errors like these reported here, here, here:

_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')

似乎不同的线程/内核试图共享一个连接,或者连接可能没有传递给工作线程.

It seems like the different threads/cores are trying to share one connection, or maybe the connection is not passed on to the workers.

如何使并行处理与Django数据库操作一起使用?

How do I get parallel processing to work with Django database actions?

推荐答案

谷歌搜索之后,我能够在

After googling around, I was able to find an old (2009) related question on the Django Google groups:

我最近正在调试类似的问题,并得出了结论 (当然这可能是错误的:)多处理和Django DB 连接在一起的效果不佳.我最终关闭了Django DB 连接在新过程中的第一件事.它将重新创建一个新的 需要一个连接但该连接将没有引用时 父母使用的连接.

Hi, I was recently debugging similar issue and came to a conclusion (which may be wrong of course :) that multiprocessing and Django DB connections don't play well together. I ended up closing Django DB connection first thing in the new process. It'll recreate a new connection when it needs one, but that one will have no references to the connection used by the parent.

因此,我的Process.start()调用了一个以以下内容开头的函数:

So, my Process.start() calls a function which starts with:

from django.db import connection

connection.close()

这解决了我的问题.

因此,要解决此问题,请将函数更改为以下形式:

Thus, to solve the issue, change the function to be something like this:

def some_func(input):
    #kill old database connection
    from django.db import connection
    connection.close()

    #code inserting data into Django here
    pass

然后工作正常.

这篇关于Django和并行处理:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆