Postgresql 批量插入或忽略 [英] Postgresql batch insert or ignore

查看:71
本文介绍了Postgresql 批量插入或忽略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有责任将我们的代码从 sqlite 切换到 postgres.下面复制了我遇到的问题之一.

I have the responsibility of switching our code from sqlite to postgres. One of the queries I am having trouble with is copied below.

INSERT INTO group_phones(group_id, phone_name)
SELECT g.id, p.name 
FROM phones AS p, groups as g
WHERE g.id IN ($add_groups) AND p.name IN ($phones);

出现重复记录时会出现问题.在此表中,两个值的组合必须是唯一的.我在其他地方使用了一些 plpgsql 函数来执行更新或插入操作,但在这种情况下,我可以一次执行多个插入操作.我不确定如何为此编写存储例程.感谢所有 sql 大师的帮助!

The problem arises when there is a duplicate record. In this table the combination of both values must be unique. I have used a few plpgsql functions in other places to do update-or-insert operations, but in this case I can do several inserts at once. I am not sure how to write a stored routine for this. Thanks for all the help from all the sql gurus out there!

推荐答案

3 个挑战.

  1. 您的查询在 phonesgroups 表之间没有 JOIN 条件,从而有效地做到了这一点一个有限的 CROSS JOIN - 你很可能不打算这样做.IE.每部符合条件的电话都与符合条件的每个组相结合.如果您有 100 部手机和 100 个群组,那么这已经是 10,000 种组合.

  1. Your query has no JOIN condition between the tables phones and groups, making this effectively a limited CROSS JOIN - which you most probably do not intend. I.e. every phone that qualifies is combined with every group that qualifies. If you have 100 phones and 100 groups that's already 10,000 combinations.

插入(group_id, phone_name)的distinct组合

Insert distinct combinations of (group_id, phone_name)

避免在 group_phones 表中插入已经存在的行.

Avoid inserting rows that are already there in table group_phones .

所有事情都认为它可能看起来像这样:

All things considered it could look like this:

INSERT INTO group_phones(group_id, phone_name)
SELECT i.id, i.name
FROM  (
    SELECT DISTINCT g.id, p.name -- get distinct combinations
    FROM   phones p
    JOIN   groups g ON ??how are p & g connected??
    WHERE  g.id IN ($add_groups)
    AND    p.name IN ($phones)
    ) i
LEFT   JOIN group_phones gp ON (gp.group_id, gp.phone_name) = (i.id, i.name)
WHERE  gp.group_id IS NULL  -- avoid duping existing rows

并发

这种形式最大限度地减少了并发写入操作出现竞争条件的机会.如果您的表有大量并发写入负载,您可能需要以独占方式锁定表 或使用 可序列化事务隔离,这可以防止极不可能发生的情况,即在约束验证之间的微小时间段内并发事务更改行(行不是't 那里)和查询中的写操作.

Concurrency

This form minimizes the chance of a race condition with concurrent write operations. If your table has heavy concurrent write load, you may want to lock the table exclusively or use serializable transaction isolation, This safeguard against the extremely unlikely case that a row is altered by a concurrent transaction in the tiny time slot between the constraint verification (row isn't there) and the write operation in the query.

BEGIN ISOLATION LEVEL SERIALIZABLE;
INSERT ...
COMMIT;

如果由于序列化错误而回滚,请准备好重复该事务.有关该主题的更多信息,好的起点可能是 @depesz 的博客文章 或此关于 SO 的相关问题.

Be prepared to repeat the transaction if it rolls back with a serialization error. For more on that topic good starting points could be this blog post by @depesz or this related question on SO.

不过,通常情况下,您甚至不必为这些烦恼.

Normally, though, you needn't even bother with any of this.

LEFT JOIN tbl ON right_col = left_col WHERE right_col IS NULL

通常是在右表中具有不同列的最快方法.如果你在专栏中有骗子(特别是如果有很多),

is generally the fastest method with distinct columns in the right table. If you have dupes in the column (especially if there are many),

WHERE NOT EXISTS (SELECT 1 FROM tbl WHERE right_col = left_col)

可能会更快,因为它可以在找到第一行后立即停止扫描.

May be faster because it can stop to scan as soon as the first row is found.

您也可以使用 IN,就像 @dezso 演示的那样,但在 PostgreSQL 中它通常较慢.

You can also use IN, like @dezso demonstrates, but it is usually slower in PostgreSQL.

这篇关于Postgresql 批量插入或忽略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆