使用 PostgreSQL 9.3 在 CTE UPSERT 中生成 DEFAULT 值 [英] Generate DEFAULT values in a CTE UPSERT using PostgreSQL 9.3

查看:30
本文介绍了使用 PostgreSQL 9.3 在 CTE UPSERT 中生成 DEFAULT 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现使用可写 CTE 来模拟 PostgreSQL 中的 upsert 是一个非常优雅的解决方案,直到我们在 Postgres 中获得实际的 upsert/merge.(参见:https://stackoverflow.com/a/8702291/558819)

I'm finding that using writable CTEs to emulate an upsert in PostgreSQL to be quite an elegant solution until we get actual upsert/merge in Postgres. (see: https://stackoverflow.com/a/8702291/558819)

但是,有一个问题:如何插入默认值?使用 NULL 当然不会有帮助,因为 NULL 被显式插入为 NULL,这与 MySQL 不同.一个例子:

However, there is one problem: how can I insert the default value? Using NULL won't help of course as NULL gets explicitly inserted as NULL, unlike for example with MySQL. An example:

WITH new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
    VALUES (651, 21, 30012, 'a', 30, 1, FALSE)
    ,      (NULL::int, 21, 1, 'b', 34, 2, NULL::boolean)
    ,      (668, 21, 30012, 'c', 30, 3, FALSE)
    ,      (7428, 21, 23068, 'd', 0, 4, FALSE)
), upsert AS (
    UPDATE playlist_items m
    SET    (playlist, item, group_name, duration, sort, legacy)
       = (nv.playlist, nv.item, nv.group_name, nv.duration, nv.sort, nv.legacy)
    FROM   new_values nv
    WHERE  nv.id = m.id
    RETURNING m.id
)
INSERT INTO playlist_items (playlist, item, group_name, duration, sort, legacy)
SELECT playlist, item, group_name, duration, sort, legacy
FROM   new_values nv
WHERE  NOT EXISTS (SELECT 1
                   FROM   upsert m
                   WHERE  nv.id = m.id)
RETURNING id

例如,我希望 legacy 列的第二个 VALUES 行采用其默认值.

So I'd like for example for the legacy column to take on its default value for the second VALUES row.

我尝试了一些方法,例如在 VALUES 列表中显式使用 DEFAULT,这不起作用,因为 CTE 不知道它插入了什么.我也尝试过 coalesce(col, DEFAULT) 在插入语句中似乎也不起作用.那么,可以做我想做的吗?

I've tried a few things, such as explicitly using DEFAULT in the VALUES list, which doesn't work because the CTE has no idea what it's inserting in. I've also tried coalesce(col, DEFAULT) in the insert statement which didn't seem to work either. So, is it possible to do what I want?

推荐答案

Postgres 9.5 实现了 UPSERT.见下文.

这是一个棘手的问题.您遇到了此限制(每个文档):

This is a tricky problem. You are running into this restriction (per documentation):

在出现在 INSERT 顶层的 VALUES 列表中,表达式可以用 DEFAULT 代替,表示目的地应插入列的默认值.DEFAULT 不能在以下情况下使用VALUES 出现在其他上下文中.

In a VALUES list appearing at the top level of an INSERT, an expression can be replaced by DEFAULT to indicate that the destination column's default value should be inserted. DEFAULT cannot be used when VALUES appears in other contexts.

粗体强调我的.如果没有要插入的表,则不会定义默认值.因此,您的问题没有直接解决方案,但有许多可能的替代路线,具体取决于具体要求.

Bold emphasis mine. Default values are not defined without a table to insert into. So there is no direct solution to your question, but there is a number of possible alternative routes, depending on exact requirements.

可以从系统目录中获取那些pg_attrdef 如@Patrick 评论 或来自information_schema.columns.在此处完成说明:

You could fetch those from the system catalog pg_attrdef like @Patrick commented or from information_schema.columns. Complete instructions here:

但是您仍然只有一个列表,其中包含表达式的文本表示来烹饪默认值.您必须动态地构建和执行语句以获取要使用的值.乏味而凌乱.相反,我们可以让内置的 Postgres 功能为我们做这些:

But then you still only have a list of rows with a text representation of the expression to cook the default value. You would have to build and execute statements dynamically to get values to work with. Tedious and messy. Instead, we can let built-in Postgres functionality do that for us:

插入一个虚拟行并让它返回以使用生成的默认值:

Insert a dummy row and have it returned to use generated defaults:

INSERT INTO playlist_items DEFAULT VALUES RETURNING *;

问题/解决方案的范围

  • 这仅保证适用于 STABLEIMMUTABLE 默认表达式.大多数 VOLATILE 函数也能正常工作,但不能保证.current_timestamp 系列函数被认为是稳定的,因为它们的值在事务中不会改变.
    特别是,这对 serial 列(或从序列中绘制的任何其他默认值)有副作用.但这应该不是问题,因为您通常不会直接写入 serial 列.那些根本不应该在 INSERT 语句中列出.
    serial 列的剩余缺陷:序列仍然通过单个调用前进以获得默认行,从而在编号中产生间隙.同样,这应该不是问题,因为在 serial 列中通常会出现间隙.
  • Problems / scope of the solution

    • This is only guaranteed to work for STABLE or IMMUTABLE default expressions. Most VOLATILE functions will work just as well, but there are no guarantees. The current_timestamp family of functions qualify as stable, since their values do not change within a transaction.
      In particular, this has side effects on serial columns (or any other defaults drawing from a sequence). But that should not be a problem, because you don't normally write to serial columns directly. Those shouldn't be listed in INSERT statements at all.
      Remaining flaw for serial columns: the sequence is still advanced by the single call to get a default row, producing a gap in the numbering. Again, that should not be a problem, because gaps are generally to be expected in serial columns.
    • 还有两个问题可以解决:

      Two more problems can be solved:

      • 如果您定义了列NOT NULL,则必须插入虚拟值并在结果中替换为NULL.

      我们实际上不想插入虚拟行.我们可以稍后删除(在同一事务中),但这可能会产生更多副作用,例如触发 ON DELETE.有一个更好的方法:

      We do not actually want to insert the dummy row. We could delete later (in the same transaction), but that may have more side effects, like triggers ON DELETE. There is a better way:

      克隆一个临时表,包括默认列并插入那个:

      Clone a temporary table including column defaults and insert into that:

      BEGIN;
      CREATE TEMP TABLE tmp_playlist_items (LIKE playlist_items INCLUDING DEFAULTS)
         ON COMMIT DROP;  -- drop at end of transaction
      
      INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *;
      ...
      

      结果相同,副作用更少.由于默认表达式是逐字复制的,如果有的话,克隆会从相同的序列中提取.但是完全避免了不需要的行或触发器的其他副作用.

      Same result, fewer side effects. Since default expressions are copied verbatim, the clone draws from the same sequences if any. But other side effects from the unwanted row or triggers are avoided completely.

      感谢 Igor 提出的想法:

      Credit to Igor for the idea:

      您必须为 NOT NULL 列提供虚拟值,因为 (每个文档):

      You would have to provide dummy values for NOT NULL columns, because (per documentation):

      非空约束总是被复制到新表中.

      Not-null constraints are always copied to the new table.

      要么适应 INSERT 语句中的那些,要么(更好地)消除约束:

      Either accommodate for those in the INSERT statement or (better) eliminate the constraints:

      ALTER TABLE tmp_playlist_items
         ALTER COLUMN foo DROP NOT NULL
       , ALTER COLUMN bar DROP NOT NULL;
      

      有一种具有超级用户权限的快速而肮脏的方法:

      There is a quick and dirty way with superuser privileges:

      UPDATE pg_attribute
      SET    attnotnull = FALSE
      WHERE  attrelid = 'tmp_playlist_items'::regclass
      AND    attnotnull
      AND    attnum > 0;
      

      它只是一个没有数据也没有其他用途的临时表,在事务结束时被删除.所以捷径很诱人.不过,基本规则是:永远不要直接篡改系统目录.

      It is just a temporary table with no data and no other purpose, and it's dropped at the end of the transaction. So the shortcut is tempting. Still, the basic rule is: never tamper with system catalogs directly.

      那么,让我们研究一种干净的方式:在 DO 语句中使用动态 SQL 实现自动化.您只需要常规权限,因为同一个角色创建了临时表.

      So, let's look into a clean way: Automate with dynamic SQL in a DO statement. You just need the regular privileges you are guaranteed to have since the same role created the temp table.

      DO $$BEGIN
      EXECUTE (
         SELECT 'ALTER TABLE tmp_playlist_items ALTER '
             || string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
             || ' DROP NOT NULL'
         FROM   pg_catalog.pg_attribute
         WHERE  attrelid = 'tmp_playlist_items'::regclass
         AND    attnotnull
         AND    attnum > 0
         );
      END$$
      

      更干净,但仍然非常快.小心执行动态命令并警惕 SQL 注入.这个说法是安全的.我已经发布了几个相关答案以及更多解释.

      Much cleaner and still very fast. Execute care with dynamic commands and be wary of SQL injection. This statement is safe. I have posted several related answers with more explanation.

      BEGIN;
      
      CREATE TEMP TABLE tmp_playlist_items
         (LIKE playlist_items INCLUDING DEFAULTS) ON COMMIT DROP;
      
      DO $$BEGIN
      EXECUTE (
         SELECT 'ALTER TABLE tmp_playlist_items ALTER '
             || string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
             || ' DROP NOT NULL'
         FROM   pg_catalog.pg_attribute
         WHERE  attrelid = 'tmp_playlist_items'::regclass
         AND    attnotnull
         AND    attnum > 0
         );
      END$$;
      
      LOCK TABLE playlist_items IN EXCLUSIVE MODE;  -- forbid concurrent writes
      
      WITH default_row AS (
         INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *
         )
      , new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
         VALUES
            (651, 21, 30012, 'a', 30, 1, FALSE)
          , (NULL, 21, 1, 'b', 34, 2, NULL)
          , (668, 21, 30012, 'c', 30, 3, FALSE)
          , (7428, 21, 23068, 'd', 0, 4, FALSE)
         )
      , upsert AS (  -- *not* replacing existing values in UPDATE (?)
         UPDATE playlist_items m
         SET   (  playlist,   item,   group_name,   duration,   sort,   legacy)
             = (n.playlist, n.item, n.group_name, n.duration, n.sort, n.legacy)
         --                                   ..., COALESCE(n.legacy, m.legacy)  -- see below
         FROM   new_values n
         WHERE  n.id = m.id
         RETURNING m.id
         )
      INSERT INTO playlist_items
              (playlist,   item,   group_name,   duration,   sort, legacy)
      SELECT n.playlist, n.item, n.group_name, n.duration, n.sort
                                         , COALESCE(n.legacy, d.legacy)
      FROM   new_values n, default_row d   -- single row can be cross-joined
      WHERE  NOT EXISTS (SELECT 1 FROM upsert u WHERE u.id = n.id)
      RETURNING id;
      
      COMMIT;

      如果您有并发事务尝试写入同一个表,您只需要 LOCK.

      You only need the LOCK if you have concurrent transactions trying to write to the same table.

      根据要求,这仅替换 INSERT 案例的输入行中 legacy 列中的 NULL 值.可以很容易地扩展到其他列或在 UPDATE 情况下工作.例如,您也可以有条件地UPDATE:仅当输入值为NOT NULL 时.我在上面的 UPDATE 中添加了注释行.

      As requested, this only replaces NULL values in the column legacy in the input rows for the INSERT case. Can easily be extended to work for other columns or in the UPDATE case as well. For instance, you could UPDATE conditionally as well: only if the input value is NOT NULL. I added a commented line to the UPDATE above.

      旁白:除了 VALUES 表达式中的第一个之外,您不需要在任何行中转换值,因为类型是从 first 派生的> 行.

      Aside: You do not need to cast values in any row but the first in a VALUES expression, since types are derived from the first row.

      使用INSERT .. ON CONFLICT .. DO NOTHING 实现UPSERT |更新.这在很大程度上简化了操作:

      implements UPSERT with INSERT .. ON CONFLICT .. DO NOTHING | UPDATE. This largely simplifies the operation:

      INSERT INTO playlist_items AS m (id, playlist, item, group_name, duration, sort, legacy)
      VALUES (651, 21, 30012, 'a', 30, 1, FALSE)
      ,      (DEFAULT, 21, 1, 'b', 34, 2, DEFAULT)  -- !
      ,      (668, 21, 30012, 'c', 30, 3, FALSE)
      ,      (7428, 21, 23068, 'd', 0, 4, FALSE)
      ON CONFLICT (id) DO UPDATE
      SET (playlist, item, group_name, duration, sort, legacy)
       = (EXCLUDED.playlist, EXCLUDED.item, EXCLUDED.group_name
        , EXCLUDED.duration, EXCLUDED.sort, EXCLUDED.legacy)
      -- (...,  COALESCE(l.legacy, EXCLUDED.legacy))  -- see below
      RETURNING m.id;
      

      我们可以将 VALUES 子句直接附加到 INSERT 上,这允许 DEFAULT 关键字.在 (id) 上出现唯一违规的情况下,Postgres 会改为更新.我们可以在 UPDATE 中使用排除的行.手册:

      We can attach the VALUES clause to INSERT directly, which allows the DEFAULT keyword. In the case of unique violations on (id), Postgres updates instead. We can use excluded rows in the UPDATE. The manual:

      ON CONFLICT DO UPDATE 中的 SETWHERE 子句可以访问使用表名(或别名)的现有行,以及行建议使用特殊的 excluded 表插入.

      The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table.

      还有:

      注意所有每行 BEFORE INSERT 触发器的效果是反映在排除值中,因为这些影响可能有贡献到被排除在插入之外的行.

      Note that the effects of all per-row BEFORE INSERT triggers are reflected in excluded values, since those effects may have contributed to the row being excluded from insertion.

      剩余的角落案例

      UPDATE 您有多种选择:您可以......

      Remaining corner case

      You have various options for the UPDATE: You can ...

      • ...根本不更新:在 UPDATE 中添加一个 WHERE 子句以仅写入选定的行.
      • ...只更新选定的列.
      • ...仅在列当前为 NULL 时更新:COALESCE(l.legacy, EXCLUDED.legacy)
      • ...仅当新值是 NOT NULL 时才更新:COALESCE(EXCLUDED.legacy, l.legacy)
      • ... not update at all: add a WHERE clause to the UPDATE to only write to selected rows.
      • ... only update selected columns.
      • ... only update if the column is currently NULL: COALESCE(l.legacy, EXCLUDED.legacy)
      • ... only update if the new value is NOT NULL: COALESCE(EXCLUDED.legacy, l.legacy)

      但是没有办法辨别DEFAULT 值和INSERT 中实际提供的值.只有结果 EXCLUDED 行是可见的.如果您需要区别,请返回到之前的解决方案,我们可以为您提供这两种解决方案.

      But there is no way to discern DEFAULT values and values actually provided in the INSERT. Only resulting EXCLUDED rows are visible. If you need the distinction, fall back to the previous solution, where you have both at our disposal.

      这篇关于使用 PostgreSQL 9.3 在 CTE UPSERT 中生成 DEFAULT 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆