如何导入*巨大的数据块到PostgreSQL? [英] How to import *huge* chunks of data to PostgreSQL?
问题描述
我有一个如下所示的数据结构:
I have a data structure that looks like this:
Model Place
primary key "id"
foreign key "parent" -> Place
foreign key "neighbor" -> Place (symmetryc)
foreign key "belongtos" -> Place (asymmetric)
a bunch of scalar fields ...
模型表中有超过500万行,我需要在两个外键表中插入〜5000万行。我有 SQL
文件,如下所示:
I have over 5 million rows in the model table, and I need to insert ~50 million rows into each of the two foreign key tables. I have SQL
files that look like this:
INSERT INTO place_belongtos (from_place_id, to_place_id) VALUES (123, 456);
,每个大约 7 Gb 。问题是,当我做 psql < belongstos.sql
,它需要我 12小时在我的AMD Turion64x2 CPU上导入〜4百万行。 OS是Gentoo〜amd64,PostgreSQL是8.4版本,本地编译。数据dir是一个绑定挂载,位于我的第二个扩展分区( ext4
),我相信这不是瓶颈。
and they are about 7 Gb each. The problem is, when I do psql < belongtos.sql
, it takes me about 12 hours to import ~4 million rows on my AMD Turion64x2 CPU. OS is Gentoo ~amd64, PostgreSQL is version 8.4, compiled locally. The data dir is a bind mount, located on my second extended partition (ext4
), which I believe is not the bottleneck.
我怀疑插入外键关系需要这么长时间,因为 psql
检查每行的键约束,这可能会增加一些不必要的开销,因为我知道数据是有效的。是否有加速导入的方法,即暂时禁用约束检查?
I'm suspecting it takes so long to insert the foreign key relations because psql
checks for the key constraints for each row, which probably adds some unnecessary overhead, as I know for sure that the data is valid. Is there a way to speed up the import, i.e. temporarily disabling the constraint check?
推荐答案
- 请确保两个外键限制都是 DEFERRABLE
- 使用 COPY 加载数据
- 如果您无法使用COPY,请使用为您的INSERT准备语句。
- Propper配置设置也将有所帮助,请检查 WAL 设置。
- Make sure both foreign key constraints are DEFERRABLE
- Use COPY to load your data
- If you can't use COPY, use a prepared statement for your INSERT.
- Propper configuration settings will also help, check the WAL settings.
这篇关于如何导入*巨大的数据块到PostgreSQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!