如何在PostreSQL中批量插入新行 [英] How to bulk insert only new rows in PostreSQL

查看:155
本文介绍了如何在PostreSQL中批量插入新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的产品列表(300万件)没有ID - 只有标题。但是我不知道DB中已经存在哪些标题。新产品(约290万件)必须加入DB。之后,我必须知道每个产品的ID(新的和现有的)。

I have list of products (3 million items) without IDs - only titles. But I don't know which titles already exist in DB. New products (about 2.9 million items) must be added into DB. After that I must know ID for each products (new and existing).

在PostgreSQL中有最快的方法吗?我可以根据需要更改数据库(添加默认值,添加列等)。

Is there the fastest way to do it in PostgreSQL? I can change DB as needed (add default values, add columns etc.).

推荐答案

导入数据



COPY 将所有内容设置为临时暂存表,并只在目标表中插入新的标题。

Import data

COPY everything to a temporary staging table and insert only new titles into your target table.

CREATE TEMP TABLE tmp(title text);

COPY tmp FROM 'path/to/file.csv';
ANALYZE tmp;

INSERT INTO tbl
SELECT DISTINCT tmp.title
FROM   tmp 
LEFT   JOIN tbl USING (title)
WHERE  tbl.title IS NULL;

应该使用 serial tbl_id tbl

LEFT JOIN / IS NULL 构造取消已经存在的标题的资格。 NOT EXISTS 将是另一种可能性。

The LEFT JOIN / IS NULL construct disqualifies already existing titles. NOT EXISTS would be another possibility.

DISTINCT 在临时表中的传入数据中复制 tmp

DISTINCT prevents duplicates in the incoming data in the temporary table tmp.

ANALYZE 有助于确保查询计划员选择一个明智的计划和临时表不通过autovacuum进行分析。

ANALYZE is useful to make sure the query planner picks a sensible plan, and temporary tables are not analyzed by autovacuum.

由于您有300万个项目,因此可能需要为 temp_buffer

Since you have 3 million items, it might pay to raise the setting for temp_buffer (for this session only):

SET temp_buffers = 1000MB;

或者你可以负担得起,足够让RAM中的临时表保持更快。注意:必须先在会话中完成 ,然后才能创建任何临时对象。

Or however much you can afford and is enough to hold the temp table in RAM, which is much faster. Note: must be done first in the session - before any temp objects are created.

要查看导入数据的所有ID:

To see all IDs for the imported data:

SELECT tbl.tbl_id, tbl.title
FROM   tbl
JOIN   tmp USING (title)

在同一个会话中!临时表在会话结束时自动删除。

In the same session! A temporary table is dropped automatically at the end of the session.

这篇关于如何在PostreSQL中批量插入新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆