使用 Django 的 ORM 加速批量插入? [英] Accelerate bulk insert using Django's ORM?

查看:33
本文介绍了使用 Django 的 ORM 加速批量插入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我计划使用 django 的 ORM 将从约 750 个文件(每个约 250MB)中提取的 10 亿条记录上传到数据库.目前每个文件需要大约 20 分钟来处理,我想知道是否有任何方法可以加速这个过程.

I'm planning to upload a billion records taken from ~750 files (each ~250MB) to a db using django's ORM. Currently each file takes ~20min to process, and I was wondering if there's any way to accelerate this process.

我采取了以下措施:

  • Use @transaction.commit_manually and commit once every 5000 records
  • Set DEBUG=False so that django won't accumulate all the sql commands in memory
  • The loop that runs over records in a single file is completely contained in a single function (minimize stack changes)
  • Refrained from hitting the db for queries (used a local hash of objects already in the db instead of using get_or_create)
  • Set force_insert=True in the save() in hopes it will save django some logic
  • Explicitly set the id in hopes it will save django some logic
  • General code minimization and optimization

我还能做些什么来加快速度?以下是我的一些想法:

What else can I do to speed things up? Here are some of my thoughts:

  • Use some kind of Python compiler or version which is quicker (Psyco?)
  • Override the ORM and use SQL directly
  • Use some 3rd party code that might be better (1, 2)
  • Beg the django community to create a bulk_insert function

欢迎提供有关这些项目的任何指示或任何其他想法:)

Any pointers regarding these items or any other idea would be welcome :)

推荐答案

Django 1.4 在 QuerySet 对象上提供了一个 bulk_create() 方法,见:

Django 1.4 provides a bulk_create() method on the QuerySet object, see:

这篇关于使用 Django 的 ORM 加速批量插入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆