Postgresql批量插入或忽略 [英] Postgresql batch insert or ignore

查看:1774
本文介绍了Postgresql批量插入或忽略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有责任将代码从sqlite切换到postgres。我遇到的一个疑问是在下面复制。

  INSERT INTO group_phones(group_id,phone_name)
SELECT g.id,p.name
FROM phones AS p,groups as g
WHERE g.id IN($ add_groups)AND p.name IN($ phones);

出现重复记录时出现问题。在这个表中,两个值的组合必须是唯一的。我在其他地方使用了几个plpgsql函数进行更新或插入操作,但在这种情况下,我可以一次执行多个插入。我不知道如何编写一个存储的例程。感谢所有的sql gurus的帮助!

解决方案

3 的挑战。


  1. 您的查询没有 JOIN 条件 手机之间的表格,使其有效地成为有限的 CROSS JOIN - 你最有可能不打算。即符合条件的每一个手机都与符合条件的每一个组合相结合。


  2. 插入 ,phone_name)


  3. 避免在表中插入 group_phones




所有考虑的内容可能如下所示:

  INSERT INTO group_phones(group_id,phone_name)
SELECT i.id,i.name
FROM(
SELECT DISTINCT g.id,p.name - 获取不同的组合
从手机p
JOIN组g ON ?? p&g如何连接?
WHERE g.id IN($ add_groups)
AND p.name IN($ phones)
)i
LEFT JOIN group_phones gp ON(gp.group_id,gp.phone_name)=(i.id,i.name)
WHERE gp.group_id IS NULL - 避免重复现有行



并发



此表单通过并发写入操作最小化竞争条件的机会。 如果您的表格具有大量并发写入加载,则可能需要专门锁定表,或使用可序列化的事务隔离,这种保护措施防止了在约束验证(行isn')之间的微小时间段中由并发事务改变行的极不情况的情况,在那里)和查询中的写入操作。

 开始分离级别可串行化; 
INSERT ...
COMMIT;

如果回滚与序列化错误,准备重复该事务。
有关这个话题的更多信息,好的起点可能是这个 @depesz的博客或者这个相关的问题在这里



通常情况下,您不用担心任何这种情况。



演出



  LEFT JOIN tbl ON right_col = left_col WHERE right_col IS NULL 

通常是右表中不同列的最快方法。如果您在列中有重复(特别是如果有的话),

  WHERE NOT EXISTS(SELECT 1 FROM tbl WHERE right_col = left_col)

可能会更快,因为一旦找到第一行就可以停止扫描。 / p>

您还可以使用 IN ,如@dezso演示,但PostgreSQL通常较慢。


I have the responsibility of switching our code from sqlite to postgres. One of the queries I am having trouble with is copied below.

INSERT INTO group_phones(group_id, phone_name)
SELECT g.id, p.name 
FROM phones AS p, groups as g
WHERE g.id IN ($add_groups) AND p.name IN ($phones);

The problem arises when there is a duplicate record. In this table the combination of both values must be unique. I have used a few plpgsql functions in other places to do update-or-insert operations, but in this case I can do several inserts at once. I am not sure how to write a stored routine for this. Thanks for all the help from all the sql gurus out there!

解决方案

There are 3 challenges.

  1. Your query has no JOIN condition between the tables phones and groups, making this effectively a limited CROSS JOIN - which you most probably do not intend. I.e. every phone that qualifies is combined with every group that qualifies. If you have 100 phones and 100 groups that's already 10,000 combinations.

  2. Insert distinct combinations of (group_id, phone_name)

  3. Avoid inserting rows that are already there in table group_phones .

All things considered it could look like this:

INSERT INTO group_phones(group_id, phone_name)
SELECT i.id, i.name
FROM  (
    SELECT DISTINCT g.id, p.name -- get distinct combinations
    FROM   phones p
    JOIN   groups g ON ??how are p & g connected??
    WHERE  g.id IN ($add_groups)
    AND    p.name IN ($phones)
    ) i
LEFT   JOIN group_phones gp ON (gp.group_id, gp.phone_name) = (i.id, i.name)
WHERE  gp.group_id IS NULL  -- avoid duping existing rows

Concurrency

This form minimizes the chance of a race condition with concurrent write operations. If your table has heavy concurrent write load, you may want to lock the table exclusively or use serializable transaction isolation, This safeguard against the extremely unlikely case that a row is altered by a concurrent transaction in the tiny time slot between the constraint verification (row isn't there) and the write operation in the query.

BEGIN ISOLATION LEVEL SERIALIZABLE;
INSERT ...
COMMIT;

Be prepared to repeat the transaction if it rolls back with a serialization error. For more on that topic good starting points could be this blog post by @depesz or this related question on SO.

Normally, though, you needn't even bother with any of this.

Performance

LEFT JOIN tbl ON right_col = left_col WHERE right_col IS NULL

is generally the fastest method with distinct columns in the right table. If you have dupes in the column (especially if there are many),

WHERE NOT EXISTS (SELECT 1 FROM tbl WHERE right_col = left_col)

May be faster because it can stop to scan as soon as the first row is found.

You can also use IN, like @dezso demonstrates, but it is usually slower in PostgreSQL.

这篇关于Postgresql批量插入或忽略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆