如何在 PostgreSQL 中 UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE)? [英] How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
问题描述
这里一个非常常见的问题是如何进行 upsert,这就是 MySQL 所说的 INSERT ... ON DUPLICATE UPDATE
并且标准支持作为 MERGE
操作.
A very frequently asked question here is how to do an upsert, which is what MySQL calls INSERT ... ON DUPLICATE UPDATE
and the standard supports as part of the MERGE
operation.
鉴于 PostgreSQL 不直接支持它(在 pg 9.5 之前),你如何做到这一点?考虑以下几点:
Given that PostgreSQL doesn't support it directly (before pg 9.5), how do you do this? Consider the following:
CREATE TABLE testtable (
id integer PRIMARY KEY,
somedata text NOT NULL
);
INSERT INTO testtable (id, somedata) VALUES
(1, 'fred'),
(2, 'bob');
现在假设您要插入"元组 (2, 'Joe')
, (3, 'Alan')
,因此新表的内容将是:
Now imagine that you want to "upsert" the tuples (2, 'Joe')
, (3, 'Alan')
, so the new table contents would be:
(1, 'fred'),
(2, 'Joe'), -- Changed value of existing tuple
(3, 'Alan') -- Added new tuple
这就是人们在讨论 upsert
时所谈论的内容.至关重要的是,任何方法都必须在同一个表上存在多个事务的情况下是安全的——要么使用显式锁定,要么以其他方式防御由此产生的竞争条件.
That's what people are talking about when discussing an upsert
. Crucially, any approach must be safe in the presence of multiple transactions working on the same table - either by using explicit locking, or otherwise defending against the resulting race conditions.
在在 PostgreSQL 中插入重复更新? 中广泛讨论了此主题,但这是关于 MySQL 语法的替代方案,而且随着时间的推移,它增加了一些不相关的细节.我正在寻找明确的答案.
This topic is discussed extensively at Insert, on duplicate update in PostgreSQL?, but that's about alternatives to the MySQL syntax, and it's grown a fair bit of unrelated detail over time. I'm working on definitive answers.
这些技术也适用于如果不存在则插入,否则什么都不做",即在重复键忽略时插入...".
These techniques are also useful for "insert if not exists, otherwise do nothing", i.e. "insert ... on duplicate key ignore".
推荐答案
9.5 及更高版本:
PostgreSQL 9.5 及更新版本支持 INSERT ... ON CONFLICT (key) DO UPDATE
(和 ON CONFLICT (key) DO NOTHING
),即 upsert.
9.5 and newer:
PostgreSQL 9.5 and newer support INSERT ... ON CONFLICT (key) DO UPDATE
(and ON CONFLICT (key) DO NOTHING
), i.e. upsert.
快速说明.
有关用法,请参阅手册 - 特别是conflict_action 子句,以及 说明文字.
For usage see the manual - specifically the conflict_action clause in the syntax diagram, and the explanatory text.
与下面给出的 9.4 及更早版本的解决方案不同,此功能适用于多个冲突行,并且不需要排他锁定或重试循环.
Unlike the solutions for 9.4 and older that are given below, this feature works with multiple conflicting rows and it doesn't require exclusive locking or a retry loop.
如果您使用的是 9.5 并且不需要向后兼容,您可以立即停止阅读.
PostgreSQL 没有任何内置的UPSERT
(或MERGE
)工具,在并发使用的情况下很难高效地做到这一点.
PostgreSQL doesn't have any built-in UPSERT
(or MERGE
) facility, and doing it efficiently in the face of concurrent use is very difficult.
通常,您必须在两个选项之间进行选择:
In general you must choose between two options:
- 重试循环中的单个插入/更新操作;或
- 锁定表并进行批量合并
如果您希望多个连接同时尝试执行插入,则在重试循环中使用单个行更新插入是合理的选择.
Using individual row upserts in a retry loop is the reasonable option if you want many connections concurrently trying to perform inserts.
PostgreSQL 文档包含一个有用的程序可以让您在数据库内的循环中执行此操作.与大多数幼稚的解决方案不同,它可以防止丢失更新和插入竞争.不过,它只能在 READ COMMITTED
模式下工作,并且只有当它是您在事务中唯一要做的事情时才是安全的.如果触发器或辅助唯一键导致唯一违规,该功能将无法正常工作.
The PostgreSQL documentation contains a useful procedure that'll let you do this in a loop inside the database. It guards against lost updates and insert races, unlike most naive solutions. It will only work in READ COMMITTED
mode and is only safe if it's the only thing you do in the transaction, though. The function won't work correctly if triggers or secondary unique keys cause unique violations.
这种策略非常低效.在可行的情况下,您应该将工作排队并进行如下所述的批量更新.
This strategy is very inefficient. Whenever practical you should queue up work and do a bulk upsert as described below instead.
许多尝试解决此问题的方法都没有考虑回滚,因此导致更新不完整.两笔交易相互竞争;其中之一成功INSERT
;另一个收到重复的密钥错误并执行 UPDATE
代替.UPDATE
阻塞等待 INSERT
回滚或提交.当它回滚时,UPDATE
条件重新检查匹配零行,因此即使 UPDATE
提交它实际上并没有完成您期望的更新插入.您必须检查结果行数并在必要时重试.
Many attempted solutions to this problem fail to consider rollbacks, so they result in incomplete updates. Two transactions race with each other; one of them successfully INSERT
s; the other gets a duplicate key error and does an UPDATE
instead. The UPDATE
blocks waiting for the INSERT
to rollback or commit. When it rolls back, the UPDATE
condition re-check matches zero rows, so even though the UPDATE
commits it hasn't actually done the upsert you expected. You have to check the result row counts and re-try where necessary.
一些尝试的解决方案也没有考虑 SELECT 比赛.如果你尝试明显而简单的:
Some attempted solutions also fail to consider SELECT races. If you try the obvious and simple:
-- THIS IS WRONG. DO NOT COPY IT. It's an EXAMPLE.
BEGIN;
UPDATE testtable
SET somedata = 'blah'
WHERE id = 2;
-- Remember, this is WRONG. Do NOT COPY IT.
INSERT INTO testtable (id, somedata)
SELECT 2, 'blah'
WHERE NOT EXISTS (SELECT 1 FROM testtable WHERE testtable.id = 2);
COMMIT;
然后当两个同时运行时会出现多种故障模式.一个是已经讨论过的更新重新检查问题.另一个是 UPDATE
同时匹配零行并继续.然后他们都做 EXISTS
测试,在 INSERT
之前.两者都得到零行,因此都执行 INSERT
.一个因重复密钥错误而失败.
then when two run at once there are several failure modes. One is the already discussed issue with an update re-check. Another is where both UPDATE
at the same time, matching zero rows and continuing. Then they both do the EXISTS
test, which happens before the INSERT
. Both get zero rows, so both do the INSERT
. One fails with a duplicate key error.
这就是您需要重试循环的原因.您可能认为使用巧妙的 SQL 可以防止重复键错误或丢失更新,但您不能.您需要检查行数或处理重复的键错误(取决于选择的方法)并重试.
This is why you need a re-try loop. You might think that you can prevent duplicate key errors or lost updates with clever SQL, but you can't. You need to check row counts or handle duplicate key errors (depending on the chosen approach) and re-try.
请不要为此推出您自己的解决方案.就像消息队列一样,它可能是错误的.
Please don't roll your own solution for this. Like with message queuing, it's probably wrong.
有时您想要进行批量更新插入,其中您有一个新数据集要合并到旧的现有数据集.这大大比单个行的 upsert 更有效,并且在可行的情况下应该是首选.
Sometimes you want to do a bulk upsert, where you have a new data set that you want to merge into an older existing data set. This is vastly more efficient than individual row upserts and should be preferred whenever practical.
在这种情况下,您通常遵循以下流程:
In this case, you typically follow the following process:
CREATE
一个TEMPORARY
表
COPY
或将新数据批量插入临时表
COPY
or bulk-insert the new data into the temp table
LOCK
目标表 IN EXCLUSIVE MODE
.这允许其他事务SELECT
,但不能对表进行任何更改.
LOCK
the target table IN EXCLUSIVE MODE
. This permits other transactions to SELECT
, but not make any changes to the table.
使用临时表中的值对现有记录执行UPDATE ... FROM
;
Do an UPDATE ... FROM
of existing records using the values in the temp table;
对目标表中尚不存在的行执行INSERT
;
Do an INSERT
of rows that don't already exist in the target table;
COMMIT
,释放锁.
例如,对于问题中给出的示例,使用多值INSERT
来填充临时表:
For example, for the example given in the question, using multi-valued INSERT
to populate the temp table:
BEGIN;
CREATE TEMPORARY TABLE newvals(id integer, somedata text);
INSERT INTO newvals(id, somedata) VALUES (2, 'Joe'), (3, 'Alan');
LOCK TABLE testtable IN EXCLUSIVE MODE;
UPDATE testtable
SET somedata = newvals.somedata
FROM newvals
WHERE newvals.id = testtable.id;
INSERT INTO testtable
SELECT newvals.id, newvals.somedata
FROM newvals
LEFT OUTER JOIN testtable ON (testtable.id = newvals.id)
WHERE testtable.id IS NULL;
COMMIT;
相关阅读
- UPSERT 维基页面
- Postgres 中的 UPSERTisms
- 在 PostgreSQL 重复更新时插入?
- http://petereisentraut.blogspot.com/2010/05/merge-语法.html
- 更新交易
- 函数中的 SELECT 或 INSERT 是否容易出现竞争条件?
- PostgreSQL wiki 上的 SQL
MERGE
- 当今在 Postgresql 中实现 UPSERT 的最惯用的方法
INSERT ... ON DUPLICATE KEY在 MySQL 中更新
MERGE
来自 MS SQL Server(但请参阅上文关于MERGE
问题)MERGE
来自 Oracle(但请参阅上文关于MERGE
问题)INSERT ... ON DUPLICATE KEY UPDATE
in MySQLMERGE
from MS SQL Server (but see above aboutMERGE
problems)MERGE
from Oracle (but see above aboutMERGE
problems)
SQL-standard MERGE
实际上定义了很差的并发语义,不适合在没有先锁定表的情况下进行更新插入.
SQL-standard MERGE
actually has poorly defined concurrency semantics and is not suitable for upserting without locking a table first.
这是一个非常有用的数据合并 OLAP 语句,但它实际上不是并发安全 upsert 的有用解决方案.对于使用其他 DBMS 的人使用 MERGE
进行更新插入的人有很多建议,但实际上这是错误的.
It's a really useful OLAP statement for data merging, but it's not actually a useful solution for concurrency-safe upsert. There's lots of advice to people using other DBMSes to use MERGE
for upserts, but it's actually wrong.
这篇关于如何在 PostgreSQL 中 UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!