函数中的 SELECT 或 INSERT 是否容易出现竞争条件? [英] Is SELECT or INSERT in a function prone to race conditions?

查看:21
本文介绍了函数中的 SELECT 或 INSERT 是否容易出现竞争条件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个函数来为一个简单的博客引擎创建帖子:

I wrote a function to create posts for a simple blogging engine:

CREATE FUNCTION CreatePost(VARCHAR, TEXT, VARCHAR[])
RETURNS INTEGER AS $$
    DECLARE
        InsertedPostId INTEGER;
        TagName VARCHAR;
    BEGIN
        INSERT INTO Posts (Title, Body)
        VALUES ($1, $2)
        RETURNING Id INTO InsertedPostId;

        FOREACH TagName IN ARRAY $3 LOOP
            DECLARE
                InsertedTagId INTEGER;
            BEGIN
                -- I am concerned about this part.
                BEGIN
                    INSERT INTO Tags (Name)
                    VALUES (TagName)
                    RETURNING Id INTO InsertedTagId;
                EXCEPTION WHEN UNIQUE_VIOLATION THEN
                    SELECT INTO InsertedTagId Id
                    FROM Tags
                    WHERE Name = TagName
                    FETCH FIRST ROW ONLY;
                END;

                INSERT INTO Taggings (PostId, TagId)
                VALUES (InsertedPostId, InsertedTagId);
            END;
        END LOOP;

        RETURN InsertedPostId;
    END;
$$ LANGUAGE 'plpgsql';

当多个用户同时删除标签和创建帖子时,这是否容易出现竞争条件?
具体来说,事务(以及函数)是否可以防止这种竞争条件的发生?
我使用的是 PostgreSQL 9.2.3.

Is this prone to race conditions when multiple users delete tags and create posts at the same time?
Specifically, do transactions (and thus functions) prevent such race conditions from happening?
I'm using PostgreSQL 9.2.3.

推荐答案

SELECT or INSERT 在可能的并发写入负载下,与(但不同于)UPSERT(即 INSERTUPDATE>).

It's the recurring problem of SELECT or INSERT under possible concurrent write load, related to (but different from) UPSERT (which is INSERT or UPDATE).

这个 PL/pgSQL 函数使用 UPSERT(INSERT ... ON CONFLICT .. DO UPDATE)INSERTSELECT 单行:

This PL/pgSQL function uses UPSERT (INSERT ... ON CONFLICT .. DO UPDATE) to INSERT or SELECT a single row:

CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
  LANGUAGE plpgsql AS
$func$
BEGIN
   SELECT tag_id  -- only if row existed before
   FROM   tag
   WHERE  tag = _tag
   INTO   _tag_id;

   IF NOT FOUND THEN
      INSERT INTO tag AS t (tag)
      VALUES (_tag)
      ON     CONFLICT (tag) DO NOTHING
      RETURNING t.tag_id
      INTO   _tag_id;
   END IF;
END
$func$;

竞争条件仍然存在很小的窗口.为了绝对确定,我们获得了一个 ID:

There is still a tiny window for a race condition. To make absolutely sure we get an ID:

CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
  LANGUAGE plpgsql AS
$func$
BEGIN
   LOOP
      SELECT tag_id
      FROM   tag
      WHERE  tag = _tag
      INTO   _tag_id;

      EXIT WHEN FOUND;

      INSERT INTO tag AS t (tag)
      VALUES (_tag)
      ON     CONFLICT (tag) DO NOTHING
      RETURNING t.tag_id
      INTO   _tag_id;

      EXIT WHEN FOUND;
   END LOOP;
END
$func$;

db<>fiddle 这里

这会一直循环直到 INSERTSELECT 成功.调用:

This keeps looping until either INSERT or SELECT succeeds. Call:

SELECT f_tag_id('possibly_new_tag');

如果后续命令在同一个事务中依赖于该行的存在并且实际上有可能其他事务同时更新或删除它,则可以在SELECT中锁定现有行 声明带有 FOR分享.
如果该行被插入,则它会被锁定(或对其他事务不可见),直到事务结束.

If subsequent commands in the same transaction rely on the existence of the row and it is actually possible that other transactions update or delete it concurrently, you can lock an existing row in the SELECT statement with FOR SHARE.
If the row gets inserted instead, it is locked (or not visible for other transactions) until the end of the transaction anyway.

从常见情况开始(INSERTSELECT)以使其更快.

Start with the common case (INSERT vs SELECT) to make it faster.

相关:

INSERTSELECT 多行(一组)的相关(纯SQL)解决方案:

Related (pure SQL) solution to INSERT or SELECT multiple rows (a set) at once:

CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
  LANGUAGE sql AS
$func$
WITH ins AS (
   INSERT INTO tag AS t (tag)
   VALUES (_tag)
   ON     CONFLICT (tag) DO NOTHING
   RETURNING t.tag_id
   )
SELECT tag_id FROM ins
UNION  ALL
SELECT tag_id FROM tag WHERE tag = _tag
LIMIT  1;
$func$;

并非完全错误,但它未能堵住漏洞,例如 @FunctorSalad 解决了.如果并发事务尝试同时执行相同操作,则该函数可能会得出空结果.手册:

Not entirely wrong, but it fails to seal a loophole, like @FunctorSalad worked out. The function can come up with an empty result if a concurrent transaction tries to do the same at the same time. The manual:

所有语句都用同一个快照执行

All the statements are executed with the same snapshot

如果并发事务稍早插入了相同的新标签,但尚未提交:

If a concurrent transaction inserts the same new tag a moment earlier, but hasn't committed, yet:

  • UPSERT 部分在等​​待并发事务完成后变为空.(如果并发事务应该回滚,它仍然会插入新标签并返回一个新 ID.)

  • The UPSERT part comes up empty, after waiting for the concurrent transaction to finish. (If the concurrent transaction should roll back, it still inserts the new tag and returns a new ID.)

SELECT 部分也为空,因为它基于相同的快照,其中来自(尚未提交的)并发事务的新标记不可见.

The SELECT part also comes up empty, because it's based on the same snapshot, where the new tag from the (yet uncommitted) concurrent transaction is not visible.

我们一无所获.不像预期的那样.这与幼稚的逻辑背道而驰(我在那里被抓住了),但这就是 Postgres 的 MVCC 模型的工作方式 - 必须工作.

We get nothing. Not as intended. That's counter-intuitive to naive logic (and I got caught there), but that's how the MVCC model of Postgres works - has to work.

因此如果多个事务可以尝试同时插入相同的标签,请不要使用此选项. 循环直到你真正得到一行.无论如何,在普通工作负载中几乎不会触发循环.

So do not use this if multiple transactions can try to insert the same tag at the same time. Or loop until you actually get a row. The loop will hardly ever be triggered in common work loads anyway.

鉴于这个(稍微简化的)表格:

Given this (slightly simplified) table:

CREATE table tag (
  tag_id serial PRIMARY KEY
, tag    text   UNIQUE
);

插入新标签/选择现有标签的几乎 100% 安全功能,看起来像这样.

An almost 100% secure function to insert new tag / select existing one, could look like this.

CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT tag_id int)
  LANGUAGE plpgsql AS
$func$
BEGIN
   LOOP
      BEGIN
      WITH sel AS (SELECT t.tag_id FROM tag t WHERE t.tag = _tag FOR SHARE)
         , ins AS (INSERT INTO tag(tag)
                   SELECT _tag
                   WHERE  NOT EXISTS (SELECT 1 FROM sel)  -- only if not found
                   RETURNING tag.tag_id)       -- qualified so no conflict with param
      SELECT sel.tag_id FROM sel
      UNION  ALL
      SELECT ins.tag_id FROM ins
      INTO   tag_id;

      EXCEPTION WHEN UNIQUE_VIOLATION THEN     -- insert in concurrent session?
         RAISE NOTICE 'It actually happened!'; -- hardly ever happens
      END;

      EXIT WHEN tag_id IS NOT NULL;            -- else keep looping
   END LOOP;
END
$func$;

db<>fiddle 这里br/>sqlfiddle

为什么不是 100%?考虑相关 UPSERT 示例的手册中的注释:

Why not 100%? Consider the notes in the manual for the related UPSERT example:

  • 尝试 SELECT 首先.通过这种方式,您可以在 99.99% 的情况下避免相当昂贵的异常处理.

  • Try the SELECT first. This way you avoid the considerably more expensive exception handling 99.99% of the time.

使用 CTE 来最小化(已经tiny) 竞争条件的时间段.

Use a CTE to minimize the (already tiny) time slot for the race condition.

SELECTINSERT 在一个查询中之间的时间窗口非常小.如果你没有很重的并发负载,或者如果你能忍受一年一次的异常,你可以忽略这种情况并使用更快的 SQL 语句.

The time window between the SELECT and the INSERT within one query is super tiny. If you don't have heavy concurrent load, or if you can live with an exception once a year, you could just ignore the case and use the SQL statement, which is faster.

不需要只取第一行(= LIMIT 1).标签名称显然是UNIQUE.

No need for FETCH FIRST ROW ONLY (= LIMIT 1). The tag name is obviously UNIQUE.

删除FOR SHARE 在我的例子中如果你通常没有并发 DELETEUPDATE>标签.性能略有下降.

Remove FOR SHARE in my example if you don't usually have concurrent DELETE or UPDATE on the table tag. Costs a tiny bit of performance.

切勿引用语言名称:'plpgsql'.plpgsql 是一个标识符.引用可能会导致问题,并且仅允许向后兼容.

Never quote the language name: 'plpgsql'. plpgsql is an identifier. Quoting may cause problems and is only tolerated for backwards compatibility.

不要使用非描述性的列名称,例如 idname.当加入几个表时(这是你在关系数据库中所做的),你最终会得到多个相同的名称并且必须使用别名.

Don't use non-descriptive column names like id or name. When joining a couple of tables (which is what you do in a relational DB) you end up with multiple identical names and have to use aliases.

使用这个函数你可以在很大程度上简化你的FOREACH LOOP:

Using this function you could largely simplify your FOREACH LOOP to:

...
FOREACH TagName IN ARRAY $3
LOOP
   INSERT INTO taggings (PostId, TagId)
   VALUES   (InsertedPostId, f_tag_id(TagName));
END LOOP;
...

但是,作为带有 unnest():

Faster, though, as a single SQL statement with unnest():

INSERT INTO taggings (PostId, TagId)
SELECT InsertedPostId, f_tag_id(tag)
FROM   unnest($3) tag;

替换整个循环.

这个变体建立在带有 LIMIT 子句的 UNION ALL 的行为之上:只要找到足够的行,其余的就永远不会执行:

This variant builds on the behavior of UNION ALL with a LIMIT clause: as soon as enough rows are found, the rest is never executed:

在此基础上,我们可以将 INSERT 外包到一个单独的函数中.只有在那里我们需要异常处理.与第一个解决方案一样安全.

Building on this, we can outsource the INSERT into a separate function. Only there we need exception handling. Just as safe as the first solution.

CREATE OR REPLACE FUNCTION f_insert_tag(_tag text, OUT tag_id int)
  RETURNS int
  LANGUAGE plpgsql AS
$func$
BEGIN
   INSERT INTO tag(tag) VALUES (_tag) RETURNING tag.tag_id INTO tag_id;

   EXCEPTION WHEN UNIQUE_VIOLATION THEN  -- catch exception, NULL is returned
END
$func$;

主函数中用到的:

CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
   LANGUAGE plpgsql AS
$func$
BEGIN
   LOOP
      SELECT tag_id FROM tag WHERE tag = _tag
      UNION  ALL
      SELECT f_insert_tag(_tag)  -- only executed if tag not found
      LIMIT  1  -- not strictly necessary, just to be clear
      INTO   _tag_id;

      EXIT WHEN _tag_id IS NOT NULL;  -- else keep looping
   END LOOP;
END
$func$;

  • 如果大多数调用只需要 SELECT,这会便宜一点,因为 INSERT 包含 EXCEPTION 的更昂贵的块> 子句很少输入.查询也更简单.

    • This is a bit cheaper if most of the calls only need SELECT, because the more expensive block with INSERT containing the EXCEPTION clause is rarely entered. The query is also simpler.

      FOR SHARE 在这里是不可能的(UNION 查询中不允许).

      FOR SHARE is not possible here (not allowed in UNION query).

      LIMIT 1 不是必需的(在 pg 9.4 中测试).Postgres 从 INTO _tag_id 派生 LIMIT 1 并且只执行直到找到第一行.

      LIMIT 1 would not be necessary (tested in pg 9.4). Postgres derives LIMIT 1 from INTO _tag_id and only executes until the first row is found.

      这篇关于函数中的 SELECT 或 INSERT 是否容易出现竞争条件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆