将大量csv文件导入PostgreSQL数据库的有效方法 [英] Efficient way to import a lot of csv files into PostgreSQL db
问题描述
我看到了很多将CSV导入PostgreSQL数据库的示例,但是我需要的是一种将500,000 CSV导入单个PostgreSQL数据库的有效方法。每个CSV都超过500KB(因此总计约272GB的数据)。
I see plenty of examples of importing a CSV into a PostgreSQL db, but what I need is an efficient way to import 500,000 CSV's into a single PostgreSQL db. Each CSV is a bit over 500KB (so grand total of approx 272GB of data).
CSV的格式相同,没有重复的记录(数据是通过编程方式生成的来自原始数据源)。我一直在搜索,并将继续在线搜索选项,但是对于以最有效的方式完成此操作的任何指导,我将不胜感激。我确实有使用Python的经验,但是会研究其他似乎合适的解决方案。
The CSV's are identically formatted and there are no duplicate records (the data was generated programatically from a raw data source). I have been searching and will continue to search online for options, but I would appreciate any direction on getting this done in the most efficient manner possible. I do have some experience with Python, but will dig into any other solution that seems appropriate.
谢谢!
推荐答案
如果您先阅读 PostgreSQL指南, 填充数据库 您将看到几条建议:
If you start by reading the PostgreSQL guide "Populating a Database" you'll see several pieces of advice:
- 在单个事务中加载数据。 li>
- 尽可能使用
COPY
。 - 在加载索引之前删除索引,外键约束等。数据并随后恢复它们。
- Load the data in a single transaction.
- Use
COPY
if at all possible. - Remove indexes, foreign key constraints etc before loading the data and restore them afterwards.
PostgreSQL的 COPY
语句已支持CSV格式:
PostgreSQL's COPY
statement already supports the CSV format:
COPY table (column1, column2, ...) FROM '/path/to/data.csv' WITH (FORMAT CSV)
因此,最好不要在以下位置使用Python全部,或仅使用Python生成所需的 COPY
语句序列。
so it looks as if you are best off not using Python at all, or using Python only to generate the required sequence of COPY
statements.
这篇关于将大量csv文件导入PostgreSQL数据库的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!