Python进程在django db上传脚本中不断增长 [英] Python process keeps growing in django db upload script

查看:76
本文介绍了Python进程在django db上传脚本中不断增长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个转换脚本,该脚本使用Django的ORM将大量数据提交给db.我使用手动提交来加快过程.我有数百个要提交的文件,每个文件将创建超过一百万个对象.

I'm running a conversion script that commits large amounts of data to a db using Django's ORM. I use manual commit to speed up the process. I have hundreds of files to to commit, each file will create more than a million objects.

我正在使用Windows 7 64位.我注意到Python进程不断增长,直到消耗超过800MB的空间,而且这仅适用于第一个文件!

I'm using Windows 7 64bit. I noticed the Python process keeps growing until it consumes more than 800MB, and this is only for the first file!

脚本循环遍历文本文件中的记录,重复使用相同的变量,并且不累积任何列表或元组.

The script loops over records in a text file, reusing the same variables and without accumulating any lists or tuples.

我在此处读到,这是Python的普遍问题(并且也许适用于任何程序),但我希望Django或Python有一些明确的方法可以减小进程的大小...

I read here that this is a general problem for Python (and perhaps for any program), but I was hoping perhaps Django or Python has some explicit way to reduce the process size...

下面是代码的概述:

import sys,os
sys.path.append(r'D:\MyProject')
os.environ['DJANGO_SETTINGS_MODULE']='my_project.settings'
from django.core.management import setup_environ
from convert_to_db import settings
from convert_to_db.convert.models import Model1, Model2, Model3
setup_environ(settings)
from django.db import transaction

@transaction.commit_manually
def process_file(filename):
    data_file = open(filename,'r')

    model1, created = Model1.objects.get_or_create([some condition])
    if created:
        option.save()

    while 1:
        line = data_file.readline()
        if line == '':
            break
        if not(input_row_i%5000):
            transaction.commit()
        line = line[:-1] # remove \n
        elements = line.split(',')

        d0 = elements[0]
        d1 = elements[1]
        d2 = elements[2]

        model2, created = Model2.objects.get_or_create([some condition])
        if created:
            option.save()

        model3 = Model3(d0=d0, d1=d1, d2=d2)
        model3 .save()

    data_file.close()
    transaction.commit()

# Some code that calls process_file() per file

推荐答案

首先,请确保您的settings.py中的DEBUG=False. DEBUG=True时,所有发送到数据库的查询都存储在django.db.connection.queries中.如果您导入许多记录,这将变成大量的内存.您可以通过外壳检查它:

First thing, make sure DEBUG=False in your settings.py. All queries sent to the database are stored in django.db.connection.queries when DEBUG=True. This will turn into a large amount of memory if you import many records. You can check it via the shell:

$ ./manage.py shell
> from django.conf import settings
> settings.DEBUG
True
> settings.DEBUG=False
> # django.db.connection.queries will now remain empty / []

如果这样做没有帮助,请尝试生成一个新的流程为每个文件运行process_file.这不是最有效的方法,但是您正在尝试降低内存使用量而不是CPU周期.这样的事情应该会让您入门:

If that does not help then try spawning a new Process to run process_file for each file. This is not the most efficient but you are trying to keep memory usage down not CPU cycles. Something like this should get you started:

from multiprocessing import Process

for filename in files_to_process:
    p = Process(target=process_file, args=(filename,))
    p.start()
    p.join()

这篇关于Python进程在django db上传脚本中不断增长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆