使用Django的ORM加速批量插入? [英] Accelerate bulk insert using Django's ORM?
本文介绍了使用Django的ORM加速批量插入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我计划使用django的ORM将从约750个文件(每个〜250MB)的十亿条记录上传到一个数据库。
目前,每个文件需要大约20分钟的时间来处理,我想知道是否有办法加速这个过程。
我采取了以下措施: / p>
- 使用@ transaction.commit_manually 并提交一次每5000条记录
- 设置DEBUG = False,以便django 不会累积内存中的所有sql命令
- 运行在单个文件中的记录的循环完全包含在一个函数中(最小化堆栈更改)
- 避免查询数据库使用db中的对象的本地哈希值而不是使用g et_or_create )
- 在save()中设置force_insert = True,希望它将保存django一些逻辑
- 明确设置id ,希望它能够保存django一些逻辑
- 通用代码最小化和优化
我还能做些什么来加快速度向上?这里有一些我的想法:
- 使用某种更快的Python编译器或版本(Psyco?)
- 覆盖ORM并直接使用SQL
- 使用可能更好的第三方代码( 1 , 2 )
- 登录django社区创建一个bulk_insert函数
任何有关这些项目或任何其他想法的指针都将受到欢迎:)
解决方案
Django 1.4在QuerySet对象上提供了一个 bulk_create()
方法,请参阅:
- https://docs.djangoproject.com/en/dev/ref/models/querysets/#django .db.models.query.QuerySet.bulk_create
- https://docs.djangoproject.com/en/dev/releases/1.4/
- https://code.djangoproject.com/ticket/7596
I'm planning to upload a billion records taken from ~750 files (each ~250MB) to a db using django's ORM. Currently each file takes ~20min to process, and I was wondering if there's any way to accelerate this process.
I've taken the following measures:
- Use @transaction.commit_manually and commit once every 5000 records
- Set DEBUG=False so that django won't accumulate all the sql commands in memory
- The loop that runs over records in a single file is completely contained in a single function (minimize stack changes)
- Refrained from hitting the db for queries (used a local hash of objects already in the db instead of using get_or_create)
- Set force_insert=True in the save() in hopes it will save django some logic
- Explicitly set the id in hopes it will save django some logic
- General code minimization and optimization
What else can I do to speed things up? Here are some of my thoughts:
- Use some kind of Python compiler or version which is quicker (Psyco?)
- Override the ORM and use SQL directly
- Use some 3rd party code that might be better (1, 2)
- Beg the django community to create a bulk_insert function
Any pointers regarding these items or any other idea would be welcome :)
解决方案
Django 1.4 provides a bulk_create()
method on the QuerySet object, see:
- https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.bulk_create
- https://docs.djangoproject.com/en/dev/releases/1.4/
- https://code.djangoproject.com/ticket/7596
这篇关于使用Django的ORM加速批量插入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文