INSERT ON CONFLICT DO NOTHING 和 SELECT 之间的竞争条件 [英] Race conditions between INSERT ON CONFLICT DO NOTHING and SELECT
问题描述
在给定默认事务隔离(已提交读)的情况下,在 INSERT ... ON CONFLICT DO NOTHING
语句之后的 SELECT
查询是否总能找到一行?
Does a SELECT
query following an INSERT … ON CONFLICT DO NOTHING
statement always find a row, given the default transaction isolation (read committed)?
我想在一个表中INSERT
-或-SELECT
一行,然后在第二个表中插入行时引用它.由于 RETURNING
不适用于 ON CONFLICT
,我到目前为止使用了 一个简单的 CTE,即使该行已经存在,它也应该始终为我提供标识列值:
I want to INSERT
-or-SELECT
a row in one table, then reference that when inserting rows in a second table. Since RETURNING
doesn't work well with ON CONFLICT
, I have so far used a simple CTE that should always give me the identity column value even if the row already exists:
$id = query(
`WITH ins AS (
INSERT INTO object (scope, name)
VALUES ($1, $2)
ON CONFLICT (scope, name) DO NOTHING
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM object WHERE scope = $1 AND name = $2
LIMIT 1;`,
[$scope, $name]
)
query(
`INSERT INTO object_member (object_id, key, value)
SELECT $1, UNNEST($2::text[]), UNNEST($3::int[]);`
[$id, $keys, $values]
)
然而,我了解到这个 CTE 在并发写入负载下并不完全安全,可能发生当不同的事务插入同一行时,upsert 和 select 变为空.
However, I learned that this CTE is not entirely safe under concurrent write load, where it can happen that both the upsert and the select come up empty when a different transaction does insert the same row.
在那里的答案(以及这里)中,建议使用另一个查询来执行选择
:
In the answers there (and also here) it is suggested to use another query to do the SELECT
:
启动一个新命令(在同一个事务中),然后可以看到上一个查询中的这些冲突行.
start a new command (in the same transaction), which then can see these conflicting rows from the previous query.
如果我理解正确,那就意味着做
If I understand correctly, it would mean to do
$id = query(
`INSERT INTO object (scope, name)
VALUES ($1, $2)
ON CONFLICT (scope, name) DO NOTHING
RETURNING id;`,
[$scope, $name]
)
if not $id:
$id = query(
`SELECT id FROM object WHERE scope = $1 AND name = $2;`
[$scope, $name]
)
query(
`INSERT INTO object_member (object_id, key, value)
SELECT $1, UNNEST($2::text[]), UNNEST($3::int[]);`
[$id, $keys, $values]
)
甚至缩短为
query(
`INSERT INTO object (scope, name)
VALUES ($1, $2)
ON CONFLICT (scope, name) DO NOTHING;`,
[$scope, $name]
)
query(
`INSERT INTO object_member (object_id, key, value)
SELECT (SELECT id FROM object WHERE scope = $1 AND name = $2), UNNEST($3::text[]), UNNEST($3::int[]);`
[$scope, $name, $keys, $values]
)
我相信这足以防止出现特定的竞争条件(在这个答案中称为并发问题 1")) - 但我不能 100% 确定不会错过任何东西.
I believe this would be enough to prevent that particular race condition (dubbed "concurrency issue 1" in this answer) - but I'm not 100% certain not to have missed anything.
还有并发问题 2"呢?如果我理解正确,这是关于在 INSERT
和 SELECT
语句之间删除或更新现有行的另一个事务 - 并且在使用多个查询而不是使用多个查询时更有可能发生CTE 方法.我应该如何处理?我假设在第二个代码片段中需要使用 FOR KEY SHARE
锁定 SELECT
- 但我是否也需要在 id
在同一个查询中使用?如果它有助于简化答案,让我们假设 object
只能插入或删除,但永远不会更新.
Also what about "concurrency issue 2"? If I understand correctly, this is about another transaction deleting or updating the existing row, inbetween the INSERT
and SELECT
statements - and it's more likely to happen when using multiple queries instead of the CTE approach. How exactly should I deal with that? I assume locking the SELECT
with FOR KEY SHARE
is necessary in the second code snippet - but would I also need that in the third snippet where the id
is used within the same query? If it helps to simplify the answer, let's assume an object
can only be inserted or deleted, but is never updated.
推荐答案
为了确保第一个表中的单行在那里,并且它的 ID 返回,你可以创建一个像这里概述的函数:
To make absolutely sure that the single row in the first table is there, and it's ID returned, you could create a function like outlined here:
要确保该行在交易期间也保持,只需确保它已锁定.如果您 INSERT
该行,它无论如何都会被锁定.如果你 SELECT
一个现有的 id
,你必须明确地锁定它——就像你建议的那样.共享密钥
只要在 (scope, name)
上有一个(非部分的、非功能性的)UNIQUE
索引,它就足以满足我们的目的,可以安全地假设给定您的 ON CONFLICT
子句.
To make sure the row also stays there for the duration of the transaction, just make sure it's locked. If you INSERT
the row, it's locked anyway. If you SELECT
an existing id
, you have to lock it explicitly - just like you suggested. FOR KEY SHARE
is strong enough for our purpose as long as there is a (non-partial, non-functional) UNIQUE
index on (scope, name)
, which is safe to assume given your ON CONFLICT
clause.
CREATE OR REPLACE FUNCTION f_object_id(_scope text, _name text, OUT _object_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT id FROM object
WHERE scope = $1
AND name = $2
-- lock to prevent deletion in the tiny time frame before the next INSERT
FOR KEY SHARE
INTO _object_id;
EXIT WHEN FOUND;
INSERT INTO object AS o (scope, name)
VALUES ($1, $2)
ON CONFLICT (scope, name) DO NOTHING
RETURNING o.id
INTO _object_id;
EXIT WHEN FOUND;
END LOOP;
END
$func$;
如果可以想象并发事务可能会在 DELETE
它(你不会 UPDATE
)在 UPDATE
code>SELECT 和下一个 INSERT
语句.
You really only need to lock the row if it's conceivable that a concurrent transaction might DELETE
it (you don't UPDATE
) in the tiny time frame between the SELECT
and the next INSERT
statement.
此外,如果您有一个从 object_member.object_id
到 object.id
的 FOREIGN KEY
约束(这似乎很可能),则参照完整性是无论如何保证.如果您不添加显式锁,并且行在中间被删除,则会出现外键违规,并且取消了对 object_member
的 INSERT
,以及整个交易.否则,另一个带有 DELETE
的事务必须等到你的事务完成,然后被相同的 FK 约束取消,因为依赖的行现在在那里(除非它被定义为 CASCADE
...) 因此,通过锁定(或不锁定),您可以决定在这种情况下是否阻止 DELETE
或 INSERT
.
Also, if you have a FOREIGN KEY
constraint from object_member.object_id
to object.id
(which seems likely), referential integrity is guaranteed anyway. If you don't add the explicit lock, and the row is deleted in between, you get a foreign key violation, and the INSERT
to object_member
is cancelled, along with the whole transaction. Else, the other transaction with the DELETE
has to wait until your transaction is done, and is then cancelled by the same FK constraint since depending rows are now there (unless it's defined to CASCADE
...) So by locking (or not) you can decide whether to prevent the DELETE
or the INSERT
in this scenario.
然后你的电话就变成:
query(
`WITH o(id) AS (SELECT f_object_id($1, $2))
INSERT INTO object_member (object_id, key, value)
SELECT o.id, UNNEST($3::text[]), UNNEST($4::int[])
FROM o;`
[$scope, $name, $keys, $values]
)
由于您显然在 object_member
中插入了多行,我将 f_object_id($1, $2)
移动到 CTE 以避免重复执行 - 这会工作,但毫无意义地昂贵.
Since you obviously insert multiple rows into object_member
, I moved f_object_id($1, $2)
to a CTE to avoid repeated execution - which would work, but pointlessly expensive.
在 Postgres 12 或更高版本中,我会通过添加 MATERIALIZED
(因为 INSERT
隐藏在函数中):
In Postgres 12 or later I would make that explicit by adding MATERIALIZED
(since the INSERT
is hidden in a function):
WITH o(id) AS MATERIALIZED (SELECT f_object_id($1, $2)) ...
旁白:对于 SELECT
列表中的多个 unnest()
,请确保您使用的是 Postgres 10 或更高版本.见:
Aside: For the multiple unnest()
in the SELECT
list, make sure you are on Postgres 10 or later. See:
在同一事务中的多个查询的应用程序逻辑中执行此操作会有什么不同(除了执行时间)吗?
Will it make any difference (apart from execution time) to do this in the application logic with multiple queries in the same transaction?
基本上没有.唯一的区别是性能.嗯,短代码和可靠性.对于每个循环,在 db 和 client 之间来回切换客观上更容易出错.但除非您有极具竞争力的交易,否则您几乎永远不会循环.
Basically no. The only difference is performance. Well, and short code and reliability. It's objectively more error prone to go back and forth between db and client for each loop. But unless you have extremely competitive transactions, you would hardly ever be looping anyway.
另一个考虑是:这件事很棘手,大多数开发人员不理解.封装在服务器端函数中,它不太可能被下一个应用程序程序员(或您自己)破坏.你也必须确保它被实际使用.无论哪种方式,正确记录您以一种或另一种方式这样做的原因......
The other consideration is this: the matter is tricky, and most developers do not understand it. Encapsulated in a server-side function, it's less likely to be broken by the next application programmer (or yourself). You have to make sure that it's actually used, too. Either way, properly document the reasons you are doing it one way or another ...
我真的想知道我的第二个代码段是否安全,或者为什么不安全(考虑到 INSERT
之后的 SELECT
中关于可见性的引用).
I really wonder whether my second snippet is safe, or why not (given the quote about visibility in the
SELECT
after theINSERT
).
大部分安全,但并非绝对安全.虽然下一个单独的 SELECT
将看到(现在已提交)与前一个 UPSERT 竞争的事务的行,但没有什么可以阻止第三个事务在此期间再次删除它.该行没有被锁定,当它不可见时您无法锁定该行,并且 Postgres 中没有可用的通用谓词锁定.
Mostly safe, but not absolutely. While the next separate SELECT
will see (now committed) rows of a transactions competing with the previous UPSERT, there is nothing to keep a third transaction from deleting it again in the meantime. The row has not been locked, and you have no way to do that while it's not visible, and there is no generic predicate locking available in Postgres.
考虑一下(T1、T2、T3 是并发事务):
Consider this (T1, T2, T3 are concurrent transactions):
T2: BEGIN transaction
T1: BEGIN transaction
T2: INSERT object 666
T1: UPSERT object 666
unique violation?
-> wait for T2
T2: COMMIT
T1: unique violation -> NO ACTION
finish statement
can't return invisible object 666
T3: DELETE object 666 & COMMIT
T1: SELECT object 666 -> no row!
BOOM!
通常情况下,这种情况极不可能发生.
但这是可能的.因此循环.
Typically it's extremely unlikely that it ever happens.
But it's possible. Hence the loop.
另一个选项是 SERIALIZABLE
事务隔离.通常更昂贵,并且您需要为序列化失败做好准备.抓住 22.
The other option is SERIALIZABLE
transaction isolation. Typically more expensive, and you need to prepare for serialization failures. Catch 22.
这篇关于INSERT ON CONFLICT DO NOTHING 和 SELECT 之间的竞争条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!