如何在PostgreSQL中进行UPSERT(MERGE,INSERT ... ON DUPLICATE UPDATE)? [英] How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?

查看:163
本文介绍了如何在PostgreSQL中进行UPSERT(MERGE,INSERT ... ON DUPLICATE UPDATE)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里一个非常常见的问题是如何进行upsert,这就是MySQL所说的INSERT ... ON DUPLICATE UPDATE,并且标准支持MERGE操作.

A very frequently asked question here is how to do an upsert, which is what MySQL calls INSERT ... ON DUPLICATE UPDATE and the standard supports as part of the MERGE operation.

鉴于PostgreSQL不直接支持它(在9.5版之前),您如何做到这一点?请考虑以下内容:

Given that PostgreSQL doesn't support it directly (before pg 9.5), how do you do this? Consider the following:

CREATE TABLE testtable (
    id integer PRIMARY KEY,
    somedata text NOT NULL
);

INSERT INTO testtable (id, somedata) VALUES
(1, 'fred'),
(2, 'bob');

现在假设您想向上插入"元组(2, 'Joe')(3, 'Alan'),因此新表的内容将是:

Now imagine that you want to "upsert" the tuples (2, 'Joe'), (3, 'Alan'), so the new table contents would be:

(1, 'fred'),
(2, 'Joe'),    -- Changed value of existing tuple
(3, 'Alan')    -- Added new tuple

这就是人们在讨论upsert时谈论的话题.至关重要的是,在有多个事务在同一个表上的情况下,任何方法都必须是安全的-通过使用显式锁定或以其他方式抵御由此产生的竞争条件.

That's what people are talking about when discussing an upsert. Crucially, any approach must be safe in the presence of multiple transactions working on the same table - either by using explicit locking, or otherwise defending against the resulting race conditions.

插入PostgreSQL中重复更新的内容中,对该主题进行了广泛的讨论?,但这是关于MySQL语法的替代方法,并且随着时间的流逝,它已经增加了许多无关的细节.我正在努力确定答案.

This topic is discussed extensively at Insert, on duplicate update in PostgreSQL?, but that's about alternatives to the MySQL syntax, and it's grown a fair bit of unrelated detail over time. I'm working on definitive answers.

这些技术还可用于如果不存在则插入,否则不执行任何操作",即在重复键忽略时插入...".

These techniques are also useful for "insert if not exists, otherwise do nothing", i.e. "insert ... on duplicate key ignore".

推荐答案

9.5及更高版本:

PostgreSQL 9.5及更高版本支持INSERT ... ON CONFLICT UPDATE(和ON CONFLICT DO NOTHING),即upsert.

9.5 and newer:

PostgreSQL 9.5 and newer support INSERT ... ON CONFLICT UPDATE (and ON CONFLICT DO NOTHING), i.e. upsert.

ON DUPLICATE KEY UPDATE 进行比较.

Comparison with ON DUPLICATE KEY UPDATE.

快速说明.

有关用法,请参见手册-特别是 conflict_action 子句和

For usage see the manual - specifically the conflict_action clause in the syntax diagram, and the explanatory text.

与下面给出的9.4及更早版本的解决方案不同,此功能可用于多个冲突的行,并且不需要排它锁定或重试循环.

Unlike the solutions for 9.4 and older that are given below, this feature works with multiple conflicting rows and it doesn't require exclusive locking or a retry loop.

在此处添加功能的提交有关其开发的讨论在这里.

如果您使用的是9.5,并且不需要向后兼容,则可以立即停止阅读..

PostgreSQL没有任何内置的UPSERT(或MERGE)工具,面对并发使用要高效地执行它是非常困难的.

PostgreSQL doesn't have any built-in UPSERT (or MERGE) facility, and doing it efficiently in the face of concurrent use is very difficult.

本文详细讨论了该问题.

通常,您必须在两个选项之间进行选择:

In general you must choose between two options:

  • 重试循环中的单个插入/更新操作;或
  • 锁定表并进行批量合并

如果您要同时尝试执行多个插入操作,则在重试循环中使用单个行向上插入是一种合理的选择.

Using individual row upserts in a retry loop is the reasonable option if you want many connections concurrently trying to perform inserts.

PostgreSQL文档包含一个有用的信息使您可以在数据库内部循环执行此过程的程序.与大多数幼稚的解决方案不同,它可以防止丢失更新和插入竞争.但是,它将仅在READ COMMITTED模式下工作,并且只有在事务中唯一执行此操作时,它才是安全的.如果触发器或辅助唯一键引起唯一违规,则该功能将无法正常工作.

The PostgreSQL documentation contains a useful procedure that'll let you do this in a loop inside the database. It guards against lost updates and insert races, unlike most naive solutions. It will only work in READ COMMITTED mode and is only safe if it's the only thing you do in the transaction, though. The function won't work correctly if triggers or secondary unique keys cause unique violations.

此策略效率很低.只要可行,您都应该将工作排入队列,并按如下所述进行批量追加.

This strategy is very inefficient. Whenever practical you should queue up work and do a bulk upsert as described below instead.

许多尝试解决此问题的方法都没有考虑回滚,因此它们导致更新不完整.两笔交易相互竞争;其中一个成功INSERT s;另一个得到重复的密钥错误,而是执行UPDATE. UPDATE阻止等待INSERT回滚或提交.当它回滚时,UPDATE条件重新检查匹配零行,因此即使UPDATE提交,它实际上并没有完成您期望的更新.您必须检查结果行计数,并在必要时重试.

Many attempted solutions to this problem fail to consider rollbacks, so they result in incomplete updates. Two transactions race with each other; one of them successfully INSERTs; the other gets a duplicate key error and does an UPDATE instead. The UPDATE blocks waiting for the INSERT to rollback or commit. When it rolls back, the UPDATE condition re-check matches zero rows, so even though the UPDATE commits it hasn't actually done the upsert you expected. You have to check the result row counts and re-try where necessary.

一些尝试的解决方案也没有考虑SELECT种族.如果您尝试显而易见的操作:

Some attempted solutions also fail to consider SELECT races. If you try the obvious and simple:

-- THIS IS WRONG. DO NOT COPY IT. It's an EXAMPLE.

BEGIN;

UPDATE testtable
SET somedata = 'blah'
WHERE id = 2;

-- Remember, this is WRONG. Do NOT COPY IT.

INSERT INTO testtable (id, somedata)
SELECT 2, 'blah'
WHERE NOT EXISTS (SELECT 1 FROM testtable WHERE testtable.id = 2);

COMMIT;

然后,当两个同时运行时,会出现几种故障模式.一个问题是已经讨论过的更新重新检查问题.另一个是两个UPDATE同时匹配零行并继续的地方.然后他们都进行EXISTS测试,该测试发生在之前 INSERT.两者都获得零行,所以INSERT都行.一个失败并出现重复的密钥错误.

then when two run at once there are several failure modes. One is the already discussed issue with an update re-check. Another is where both UPDATE at the same time, matching zero rows and continuing. Then they both do the EXISTS test, which happens before the INSERT. Both get zero rows, so both do the INSERT. One fails with a duplicate key error.

这就是为什么您需要重试循环的原因.您可能会认为,使用聪明的SQL可以防止重复的键错误或更新丢失,但是您不能这样做.您需要检查行数或处理重复的键错误(取决于所选择的方法),然后重试.

This is why you need a re-try loop. You might think that you can prevent duplicate key errors or lost updates with clever SQL, but you can't. You need to check row counts or handle duplicate key errors (depending on the chosen approach) and re-try.

请不要为此提出自己的解决方案.像消息队列一样,这可能是错误的.

Please don't roll your own solution for this. Like with message queuing, it's probably wrong.

有时您想进行批量增补,在这里您有一个新的数据集,您希望将其合并为一个较旧的现有数据集.这比单个行upsert效率高得多,并且在可行时应该优先使用.

Sometimes you want to do a bulk upsert, where you have a new data set that you want to merge into an older existing data set. This is vastly more efficient than individual row upserts and should be preferred whenever practical.

在这种情况下,通常需要执行以下过程:

In this case, you typically follow the following process:

  • CREATE一个TEMPORARY

COPY或将新数据批量插入到临时表中

COPY or bulk-insert the new data into the temp table

LOCK目标表IN EXCLUSIVE MODE.这允许其他事务进行SELECT,但不能对表进行任何更改.

LOCK the target table IN EXCLUSIVE MODE. This permits other transactions to SELECT, but not make any changes to the table.

使用temp表中的值对现有记录进行UPDATE ... FROM

Do an UPDATE ... FROM of existing records using the values in the temp table;

对目标表中不存在的行进行INSERT

Do an INSERT of rows that don't already exist in the target table;

COMMIT,释放锁.

例如,对于问题中给出的示例,使用多值INSERT填充临时表:

For example, for the example given in the question, using multi-valued INSERT to populate the temp table:

BEGIN;

CREATE TEMPORARY TABLE newvals(id integer, somedata text);

INSERT INTO newvals(id, somedata) VALUES (2, 'Joe'), (3, 'Alan');

LOCK TABLE testtable IN EXCLUSIVE MODE;

UPDATE testtable
SET somedata = newvals.somedata
FROM newvals
WHERE newvals.id = testtable.id;

INSERT INTO testtable
SELECT newvals.id, newvals.somedata
FROM newvals
LEFT OUTER JOIN testtable ON (testtable.id = newvals.id)
WHERE testtable.id IS NULL;

COMMIT;

相关阅读

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆