如何在 PostgreSQL 中 UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE)? [英] How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?

查看:31
本文介绍了如何在 PostgreSQL 中 UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里一个非常常见的问题是如何进行 upsert,这就是 MySQL 所说的 INSERT ... ON DUPLICATE UPDATE 并且标准支持作为 MERGE 操作.

A very frequently asked question here is how to do an upsert, which is what MySQL calls INSERT ... ON DUPLICATE UPDATE and the standard supports as part of the MERGE operation.

鉴于 PostgreSQL 不直接支持它(在 pg 9.5 之前),你如何做到这一点?考虑以下几点:

Given that PostgreSQL doesn't support it directly (before pg 9.5), how do you do this? Consider the following:

CREATE TABLE testtable (
    id integer PRIMARY KEY,
    somedata text NOT NULL
);

INSERT INTO testtable (id, somedata) VALUES
(1, 'fred'),
(2, 'bob');

现在假设您要插入"元组 (2, 'Joe'), (3, 'Alan'),因此新表的内容将是:

Now imagine that you want to "upsert" the tuples (2, 'Joe'), (3, 'Alan'), so the new table contents would be:

(1, 'fred'),
(2, 'Joe'),    -- Changed value of existing tuple
(3, 'Alan')    -- Added new tuple

这就是人们在讨论 upsert 时所谈论的内容.至关重要的是,任何方法都必须在同一个表上存在多个事务的情况下是安全的——要么使用显式锁定,要么以其他方式防御由此产生的竞争条件.

That's what people are talking about when discussing an upsert. Crucially, any approach must be safe in the presence of multiple transactions working on the same table - either by using explicit locking, or otherwise defending against the resulting race conditions.

在 PostgreSQL 中插入重复更新? 中广泛讨论了此主题,但这是关于 MySQL 语法的替代方案,而且随着时间的推移,它增加了一些不相关的细节.我正在寻找明确的答案.

This topic is discussed extensively at Insert, on duplicate update in PostgreSQL?, but that's about alternatives to the MySQL syntax, and it's grown a fair bit of unrelated detail over time. I'm working on definitive answers.

这些技术也适用于如果不存在则插入,否则什么都不做",即在重复键忽略时插入...".

These techniques are also useful for "insert if not exists, otherwise do nothing", i.e. "insert ... on duplicate key ignore".

推荐答案

9.5 及更高版本:

PostgreSQL 9.5 及更新版本支持 INSERT ... ON CONFLICT (key) DO UPDATE(和 ON CONFLICT (key) DO NOTHING),即 upsert.

9.5 and newer:

PostgreSQL 9.5 and newer support INSERT ... ON CONFLICT (key) DO UPDATE (and ON CONFLICT (key) DO NOTHING), i.e. upsert.

ON DUPLICATE KEY UPDATE的比较.

快速说明.

有关用法,请参阅手册 - 特别是conflict_action 子句,以及 说明文字.

For usage see the manual - specifically the conflict_action clause in the syntax diagram, and the explanatory text.

与下面给出的 9.4 及更早版本的解决方案不同,此功能适用于多个冲突行,并且不需要排他锁定或重试循环.

Unlike the solutions for 9.4 and older that are given below, this feature works with multiple conflicting rows and it doesn't require exclusive locking or a retry loop.

添加功能的提交在这里有关其开发的讨论在这里.

如果您使用的是 9.5 并且不需要向后兼容,您可以立即停止阅读.

PostgreSQL 没有任何内置的UPSERT(或MERGE)工具,在并发使用的情况下很难高效地做到这一点.

PostgreSQL doesn't have any built-in UPSERT (or MERGE) facility, and doing it efficiently in the face of concurrent use is very difficult.

本文详细讨论了这个问题.

通常,您必须在两个选项之间进行选择:

In general you must choose between two options:

  • 重试循环中的单个插入/更新操作;或
  • 锁定表并进行批量合并

如果您希望多个连接同时尝试执行插入,则在重试循环中使用单个行更新插入是合理的选择.

Using individual row upserts in a retry loop is the reasonable option if you want many connections concurrently trying to perform inserts.

PostgreSQL 文档包含一个有用的程序可以让您在数据库内的循环中执行此操作.与大多数幼稚的解决方案不同,它可以防止丢失更新和插入竞争.不过,它只能在 READ COMMITTED 模式下工作,并且只有当它是您在事务中唯一要做的事情时才是安全的.如果触发器或辅助唯一键导致唯一违规,该功能将无法正常工作.

The PostgreSQL documentation contains a useful procedure that'll let you do this in a loop inside the database. It guards against lost updates and insert races, unlike most naive solutions. It will only work in READ COMMITTED mode and is only safe if it's the only thing you do in the transaction, though. The function won't work correctly if triggers or secondary unique keys cause unique violations.

这种策略非常低效.在可行的情况下,您应该将工作排队并进行如下所述的批量更新.

This strategy is very inefficient. Whenever practical you should queue up work and do a bulk upsert as described below instead.

许多尝试解决此问题的方法都没有考虑回滚,因此导致更新不完整.两笔交易相互竞争;其中之一成功INSERT;另一个收到重复的密钥错误并执行 UPDATE 代替.UPDATE 阻塞等待 INSERT 回滚或提交.当它回滚时,UPDATE 条件重新检查匹配零行,因此即使 UPDATE 提交它实际上并没有完成您期望的更新插入.您必须检查结果行数并在必要时重试.

Many attempted solutions to this problem fail to consider rollbacks, so they result in incomplete updates. Two transactions race with each other; one of them successfully INSERTs; the other gets a duplicate key error and does an UPDATE instead. The UPDATE blocks waiting for the INSERT to rollback or commit. When it rolls back, the UPDATE condition re-check matches zero rows, so even though the UPDATE commits it hasn't actually done the upsert you expected. You have to check the result row counts and re-try where necessary.

一些尝试的解决方案也没有考虑 SELECT 比赛.如果你尝试明显而简单的:

Some attempted solutions also fail to consider SELECT races. If you try the obvious and simple:

-- THIS IS WRONG. DO NOT COPY IT. It's an EXAMPLE.

BEGIN;

UPDATE testtable
SET somedata = 'blah'
WHERE id = 2;

-- Remember, this is WRONG. Do NOT COPY IT.

INSERT INTO testtable (id, somedata)
SELECT 2, 'blah'
WHERE NOT EXISTS (SELECT 1 FROM testtable WHERE testtable.id = 2);

COMMIT;

然后当两个同时运行时会出现多种故障模式.一个是已经讨论过的更新重新检查问题.另一个是 UPDATE 同时匹配零行并继续.然后他们都做 EXISTS 测试,在 INSERT 之前.两者都得到零行,因此都执行 INSERT.一个因重复密钥错误而失败.

then when two run at once there are several failure modes. One is the already discussed issue with an update re-check. Another is where both UPDATE at the same time, matching zero rows and continuing. Then they both do the EXISTS test, which happens before the INSERT. Both get zero rows, so both do the INSERT. One fails with a duplicate key error.

这就是您需要重试循环的原因.您可能认为使用巧妙的 SQL 可以防止重复键错误或丢失更新,但您不能.您需要检查行数或处理重复的键错误(取决于选择的方法)并重试.

This is why you need a re-try loop. You might think that you can prevent duplicate key errors or lost updates with clever SQL, but you can't. You need to check row counts or handle duplicate key errors (depending on the chosen approach) and re-try.

请不要为此推出您自己的解决方案.就像消息队列一样,它可能是错误的.

Please don't roll your own solution for this. Like with message queuing, it's probably wrong.

有时您想要进行批量更新插入,其中您有一个新数据集要合并到旧的现有数据集.这大大比单个行的 upsert 更有效,并且在可行的情况下应该是首选.

Sometimes you want to do a bulk upsert, where you have a new data set that you want to merge into an older existing data set. This is vastly more efficient than individual row upserts and should be preferred whenever practical.

在这种情况下,您通常遵循以下流程:

In this case, you typically follow the following process:

  • CREATE一个TEMPORARY

COPY 或将新数据批量插入临时表

COPY or bulk-insert the new data into the temp table

LOCK 目标表 IN EXCLUSIVE MODE.这允许其他事务SELECT,但不能对表进行任何更改.

LOCK the target table IN EXCLUSIVE MODE. This permits other transactions to SELECT, but not make any changes to the table.

使用临时表中的值对现有记录执行UPDATE ... FROM

Do an UPDATE ... FROM of existing records using the values in the temp table;

对目标表中尚不存在的行执行INSERT

Do an INSERT of rows that don't already exist in the target table;

COMMIT,释放锁.

例如,对于问题中给出的示例,使用多值INSERT 来填充临时表:

For example, for the example given in the question, using multi-valued INSERT to populate the temp table:

BEGIN;

CREATE TEMPORARY TABLE newvals(id integer, somedata text);

INSERT INTO newvals(id, somedata) VALUES (2, 'Joe'), (3, 'Alan');

LOCK TABLE testtable IN EXCLUSIVE MODE;

UPDATE testtable
SET somedata = newvals.somedata
FROM newvals
WHERE newvals.id = testtable.id;

INSERT INTO testtable
SELECT newvals.id, newvals.somedata
FROM newvals
LEFT OUTER JOIN testtable ON (testtable.id = newvals.id)
WHERE testtable.id IS NULL;

COMMIT;

相关阅读

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆