Python进程在django db上传脚本中不断增长 [英] Python process keeps growing in django db upload script
问题描述
我正在运行一个转换脚本,该脚本使用Django的ORM将大量数据提交给db.我使用手动提交来加快过程.我有数百个要提交的文件,每个文件将创建超过一百万个对象.
I'm running a conversion script that commits large amounts of data to a db using Django's ORM. I use manual commit to speed up the process. I have hundreds of files to to commit, each file will create more than a million objects.
我正在使用Windows 7 64位.我注意到Python进程不断增长,直到消耗超过800MB的空间,而且这仅适用于第一个文件!
I'm using Windows 7 64bit. I noticed the Python process keeps growing until it consumes more than 800MB, and this is only for the first file!
脚本循环遍历文本文件中的记录,重复使用相同的变量,并且不累积任何列表或元组.
The script loops over records in a text file, reusing the same variables and without accumulating any lists or tuples.
我在此处读到,这是Python的普遍问题(并且也许适用于任何程序),但我希望Django或Python有一些明确的方法可以减小进程的大小...
I read here that this is a general problem for Python (and perhaps for any program), but I was hoping perhaps Django or Python has some explicit way to reduce the process size...
下面是代码的概述:
import sys,os
sys.path.append(r'D:\MyProject')
os.environ['DJANGO_SETTINGS_MODULE']='my_project.settings'
from django.core.management import setup_environ
from convert_to_db import settings
from convert_to_db.convert.models import Model1, Model2, Model3
setup_environ(settings)
from django.db import transaction
@transaction.commit_manually
def process_file(filename):
data_file = open(filename,'r')
model1, created = Model1.objects.get_or_create([some condition])
if created:
option.save()
while 1:
line = data_file.readline()
if line == '':
break
if not(input_row_i%5000):
transaction.commit()
line = line[:-1] # remove \n
elements = line.split(',')
d0 = elements[0]
d1 = elements[1]
d2 = elements[2]
model2, created = Model2.objects.get_or_create([some condition])
if created:
option.save()
model3 = Model3(d0=d0, d1=d1, d2=d2)
model3 .save()
data_file.close()
transaction.commit()
# Some code that calls process_file() per file
推荐答案
首先,请确保您的settings.py中的DEBUG=False
. DEBUG=True
时,所有发送到数据库的查询都存储在django.db.connection.queries
中.如果您导入许多记录,这将变成大量的内存.您可以通过外壳检查它:
First thing, make sure DEBUG=False
in your settings.py. All queries sent to the database are stored in django.db.connection.queries
when DEBUG=True
. This will turn into a large amount of memory if you import many records. You can check it via the shell:
$ ./manage.py shell
> from django.conf import settings
> settings.DEBUG
True
> settings.DEBUG=False
> # django.db.connection.queries will now remain empty / []
如果这样做没有帮助,请尝试生成一个新的流程为每个文件运行process_file.这不是最有效的方法,但是您正在尝试降低内存使用量而不是CPU周期.这样的事情应该会让您入门:
If that does not help then try spawning a new Process to run process_file for each file. This is not the most efficient but you are trying to keep memory usage down not CPU cycles. Something like this should get you started:
from multiprocessing import Process
for filename in files_to_process:
p = Process(target=process_file, args=(filename,))
p.start()
p.join()
这篇关于Python进程在django db上传脚本中不断增长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!