INSERT ON CONFLICT DO NOTHING 和 SELECT 之间的竞争条件 [英] Race conditions between INSERT ON CONFLICT DO NOTHING and SELECT

查看:50
本文介绍了INSERT ON CONFLICT DO NOTHING 和 SELECT 之间的竞争条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在给定默认事务隔离(已提交读)的情况下,在 INSERT ... ON CONFLICT DO NOTHING 语句之后的 SELECT 查询是否总能找到一行?

Does a SELECT query following an INSERT … ON CONFLICT DO NOTHING statement always find a row, given the default transaction isolation (read committed)?

我想在一个表中INSERT-或-SELECT 一行,然后在第二个表中插入行时引用它.由于 RETURNING 不适用于 ON CONFLICT,我到目前为止使用了 一个简单的 CTE,即使该行已经存在,它也应该始终为我提供标识列值:

I want to INSERT-or-SELECT a row in one table, then reference that when inserting rows in a second table. Since RETURNING doesn't work well with ON CONFLICT, I have so far used a simple CTE that should always give me the identity column value even if the row already exists:

$id = query(
  `WITH ins AS (
    INSERT INTO object (scope, name)
    VALUES ($1, $2)
    ON CONFLICT (scope, name) DO NOTHING
    RETURNING id
  )
  SELECT id FROM ins
  UNION  ALL
  SELECT id FROM object WHERE scope = $1 AND name = $2
  LIMIT 1;`,
  [$scope, $name]
)
query(
  `INSERT INTO object_member (object_id, key, value)
  SELECT $1, UNNEST($2::text[]), UNNEST($3::int[]);`
  [$id, $keys, $values]
)

然而,我了解到这个 CTE 在并发写入负载下并不完全安全,可能发生当不同的事务插入同一行时,upsert 和 select 变为空.

However, I learned that this CTE is not entirely safe under concurrent write load, where it can happen that both the upsert and the select come up empty when a different transaction does insert the same row.

在那里的答案(以及这里)中,建议使用另一个查询来执行选择:

In the answers there (and also here) it is suggested to use another query to do the SELECT:

启动一个新命令(在同一个事务中),然后可以看到上一个查询中的这些冲突行.

start a new command (in the same transaction), which then can see these conflicting rows from the previous query.

如果我理解正确,那就意味着做

If I understand correctly, it would mean to do

$id = query(
  `INSERT INTO object (scope, name)
  VALUES ($1, $2)
  ON CONFLICT (scope, name) DO NOTHING
  RETURNING id;`,
  [$scope, $name]
)
if not $id:
  $id = query(
    `SELECT id FROM object WHERE scope = $1 AND name = $2;`
    [$scope, $name]
  )
query(
  `INSERT INTO object_member (object_id, key, value)
  SELECT $1, UNNEST($2::text[]), UNNEST($3::int[]);`
  [$id, $keys, $values]
)

甚至缩短为

query(
  `INSERT INTO object (scope, name)
  VALUES ($1, $2)
  ON CONFLICT (scope, name) DO NOTHING;`,
  [$scope, $name]
)
query(
  `INSERT INTO object_member (object_id, key, value)
  SELECT (SELECT id FROM object WHERE scope = $1 AND name = $2), UNNEST($3::text[]), UNNEST($3::int[]);`
  [$scope, $name, $keys, $values]
)

我相信这足以防止出现特定的竞争条件(在这个答案中称为并发问题 1")) - 但我不能 100% 确定不会错过任何东西.

I believe this would be enough to prevent that particular race condition (dubbed "concurrency issue 1" in this answer) - but I'm not 100% certain not to have missed anything.

还有并发问题 2"呢?如果我理解正确,这是关于在 INSERTSELECT 语句之间删除或更新现有行的另一个事务 - 并且在使用多个查询而不是使用多个查询时更有可能发生CTE 方法.我应该如何处理?我假设在第二个代码片段中需要使用 FOR KEY SHARE 锁定 SELECT - 但我是否也需要在 id 在同一个查询中使用?如果它有助于简化答案,让我们假设 object 只能插入或删除,但永远不会更新.

Also what about "concurrency issue 2"? If I understand correctly, this is about another transaction deleting or updating the existing row, inbetween the INSERT and SELECT statements - and it's more likely to happen when using multiple queries instead of the CTE approach. How exactly should I deal with that? I assume locking the SELECT with FOR KEY SHARE is necessary in the second code snippet - but would I also need that in the third snippet where the id is used within the same query? If it helps to simplify the answer, let's assume an object can only be inserted or deleted, but is never updated.

推荐答案

为了确保第一个表中的单行在那里,并且它的 ID 返回,你可以创建一个像这里概述的函数:

To make absolutely sure that the single row in the first table is there, and it's ID returned, you could create a function like outlined here:

要确保该行在交易期间也保持,只需确保它已锁定.如果您 INSERT 该行,它无论如何都会被锁定.如果你 SELECT 一个现有的 id,你必须明确地锁定它——就像你建议的那样.共享密钥只要在 (scope, name) 上有一个(非部分的、非功能性的)UNIQUE 索引,它就足以满足我们的目的,可以安全地假设给定您的 ON CONFLICT 子句.

To make sure the row also stays there for the duration of the transaction, just make sure it's locked. If you INSERT the row, it's locked anyway. If you SELECT an existing id, you have to lock it explicitly - just like you suggested. FOR KEY SHARE is strong enough for our purpose as long as there is a (non-partial, non-functional) UNIQUE index on (scope, name), which is safe to assume given your ON CONFLICT clause.

CREATE OR REPLACE FUNCTION f_object_id(_scope text, _name text, OUT _object_id int)
  LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
   SELECT id FROM object
   WHERE  scope = $1
   AND    name  = $2
   -- lock to prevent deletion in the tiny time frame before the next INSERT
   FOR    KEY SHARE
   INTO   _object_id;

   EXIT WHEN FOUND;

   INSERT INTO object AS o (scope, name)
   VALUES ($1, $2)
   ON     CONFLICT (scope, name) DO NOTHING
   RETURNING o.id
   INTO   _object_id;

   EXIT WHEN FOUND;
END LOOP;
END
$func$;

如果可以想象并发事务可能会在 DELETE 它(你不会 UPDATE)在 UPDATEcode>SELECT 和下一个 INSERT 语句.

You really only need to lock the row if it's conceivable that a concurrent transaction might DELETE it (you don't UPDATE) in the tiny time frame between the SELECT and the next INSERT statement.

此外,如果您有一个从 object_member.object_idobject.idFOREIGN KEY 约束(这似乎很可能),则参照完整性是无论如何保证.如果您不添加显式锁,并且行在中间被删除,则会出现外键违规,并且取消了对 object_memberINSERT,以及整个交易.否则,另一个带有 DELETE 的事务必须等到你的事务完成,然后被相同的 FK 约束取消,因为依赖的行现在在那里(除非它被定义为 CASCADE ...) 因此,通过锁定(或不锁定),您可以决定在这种情况下是否阻止 DELETEINSERT.

Also, if you have a FOREIGN KEY constraint from object_member.object_id to object.id (which seems likely), referential integrity is guaranteed anyway. If you don't add the explicit lock, and the row is deleted in between, you get a foreign key violation, and the INSERT to object_member is cancelled, along with the whole transaction. Else, the other transaction with the DELETE has to wait until your transaction is done, and is then cancelled by the same FK constraint since depending rows are now there (unless it's defined to CASCADE ...) So by locking (or not) you can decide whether to prevent the DELETE or the INSERT in this scenario.

然后你的电话就变成:

query(
  `WITH o(id) AS (SELECT f_object_id($1, $2))
   INSERT INTO object_member (object_id, key, value)
   SELECT o.id, UNNEST($3::text[]), UNNEST($4::int[])
   FROM   o;`
  [$scope, $name, $keys, $values]
)

由于您显然在 object_member 中插入了多行,我将 f_object_id($1, $2) 移动到 CTE 以避免重复执行 - 这会工作,但毫无意义地昂贵.

Since you obviously insert multiple rows into object_member, I moved f_object_id($1, $2) to a CTE to avoid repeated execution - which would work, but pointlessly expensive.

在 Postgres 12 或更高版本中,我会通过添加 MATERIALIZED(因为 INSERT 隐藏在函数中):

In Postgres 12 or later I would make that explicit by adding MATERIALIZED (since the INSERT is hidden in a function):

WITH o(id) AS MATERIALIZED (SELECT f_object_id($1, $2)) ...

旁白:对于 SELECT 列表中的多个 unnest(),请确保您使用的是 Postgres 10 或更高版本.见:

Aside: For the multiple unnest() in the SELECT list, make sure you are on Postgres 10 or later. See:

在同一事务中的多个查询的应用程序逻辑中执行此操作会有什么不同(除了执行时间)吗?

Will it make any difference (apart from execution time) to do this in the application logic with multiple queries in the same transaction?

基本上没有.唯一的区别是性能.嗯,短代码和可靠性.对于每个循环,在 db 和 client 之间来回切换客观上更容易出错.但除非您有极具竞争力的交易,否则您几乎永远不会循环.

Basically no. The only difference is performance. Well, and short code and reliability. It's objectively more error prone to go back and forth between db and client for each loop. But unless you have extremely competitive transactions, you would hardly ever be looping anyway.

另一个考虑是:这件事很棘手,大多数开发人员不理解.封装在服务器端函数中,它不太可能被下一个应用程序程序员(或您自己)破坏.你也必须确保它被实际使用.无论哪种方式,正确记录您以一种或另一种方式这样做的原因......

The other consideration is this: the matter is tricky, and most developers do not understand it. Encapsulated in a server-side function, it's less likely to be broken by the next application programmer (or yourself). You have to make sure that it's actually used, too. Either way, properly document the reasons you are doing it one way or another ...

我真的想知道我的第二个代码段是否安全,或者为什么不安全(考虑到 INSERT 之后的 SELECT 中关于可见性的引用).

I really wonder whether my second snippet is safe, or why not (given the quote about visibility in the SELECT after the INSERT).

大部分安全,但并非绝对安全.虽然下一个单独的 SELECT 将看到(现在已提交)与前一个 UPSERT 竞争的事务的行,但没有什么可以阻止第三个事务在此期间再次删除它.该行没有被锁定,当它不可见时您无法锁定该行,并且 Postgres 中没有可用的通用谓词锁定.

Mostly safe, but not absolutely. While the next separate SELECT will see (now committed) rows of a transactions competing with the previous UPSERT, there is nothing to keep a third transaction from deleting it again in the meantime. The row has not been locked, and you have no way to do that while it's not visible, and there is no generic predicate locking available in Postgres.

考虑一下(T1、T2、T3 是并发事务):

Consider this (T1, T2, T3 are concurrent transactions):

                               T2: BEGIN transaction
T1: BEGIN transaction
                               T2: INSERT object 666
T1: UPSERT object 666
    unique violation?
    -> wait for T2
                               T2: COMMIT
T1: unique violation -> NO ACTION
    finish statement
    can't return invisible object 666
                                             T3: DELETE object 666 & COMMIT
T1: SELECT object 666 -> no row!
    BOOM!

通常情况下,这种情况极不可能发生.
但这是可能的.因此循环.

Typically it's extremely unlikely that it ever happens.
But it's possible. Hence the loop.

另一个选项是 SERIALIZABLE 事务隔离.通常更昂贵,并且您需要为序列化失败做好准备.抓住 22.

The other option is SERIALIZABLE transaction isolation. Typically more expensive, and you need to prepare for serialization failures. Catch 22.

这篇关于INSERT ON CONFLICT DO NOTHING 和 SELECT 之间的竞争条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆