使用 PostgreSQL 9.3 在 CTE UPSERT 中生成 DEFAULT 值 [英] Generate DEFAULT values in a CTE UPSERT using PostgreSQL 9.3
问题描述
我发现使用可写 CTE 来模拟 PostgreSQL 中的 upsert 是一个非常优雅的解决方案,直到我们在 Postgres 中获得实际的 upsert/merge.(参见:https://stackoverflow.com/a/8702291/558819)
I'm finding that using writable CTEs to emulate an upsert in PostgreSQL to be quite an elegant solution until we get actual upsert/merge in Postgres. (see: https://stackoverflow.com/a/8702291/558819)
但是,有一个问题:如何插入默认值?使用 NULL
当然不会有帮助,因为 NULL
被显式插入为 NULL
,这与 MySQL 不同.一个例子:
However, there is one problem: how can I insert the default value? Using NULL
won't help of course as NULL
gets explicitly inserted as NULL
, unlike for example with MySQL. An example:
WITH new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
VALUES (651, 21, 30012, 'a', 30, 1, FALSE)
, (NULL::int, 21, 1, 'b', 34, 2, NULL::boolean)
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
), upsert AS (
UPDATE playlist_items m
SET (playlist, item, group_name, duration, sort, legacy)
= (nv.playlist, nv.item, nv.group_name, nv.duration, nv.sort, nv.legacy)
FROM new_values nv
WHERE nv.id = m.id
RETURNING m.id
)
INSERT INTO playlist_items (playlist, item, group_name, duration, sort, legacy)
SELECT playlist, item, group_name, duration, sort, legacy
FROM new_values nv
WHERE NOT EXISTS (SELECT 1
FROM upsert m
WHERE nv.id = m.id)
RETURNING id
例如,我希望 legacy
列的第二个 VALUES
行采用其默认值.
So I'd like for example for the legacy
column to take on its default value for the second VALUES
row.
我尝试了一些方法,例如在 VALUES 列表中显式使用 DEFAULT
,这不起作用,因为 CTE 不知道它插入了什么.我也尝试过 coalesce(col, DEFAULT)
在插入语句中似乎也不起作用.那么,可以做我想做的吗?
I've tried a few things, such as explicitly using DEFAULT
in the VALUES list, which doesn't work because the CTE has no idea what it's inserting in. I've also tried coalesce(col, DEFAULT)
in the insert statement which didn't seem to work either. So, is it possible to do what I want?
推荐答案
Postgres 9.5 实现了 UPSERT
.见下文.
这是一个棘手的问题.您遇到了此限制(每个文档):
This is a tricky problem. You are running into this restriction (per documentation):
在出现在 INSERT
顶层的 VALUES
列表中,表达式可以用 DEFAULT
代替,表示目的地应插入列的默认值.DEFAULT
不能在以下情况下使用VALUES
出现在其他上下文中.
In a
VALUES
list appearing at the top level of anINSERT
, an expression can be replaced byDEFAULT
to indicate that the destination column's default value should be inserted.DEFAULT
cannot be used whenVALUES
appears in other contexts.
粗体强调我的.如果没有要插入的表,则不会定义默认值.因此,您的问题没有直接解决方案,但有许多可能的替代路线,具体取决于具体要求.
Bold emphasis mine. Default values are not defined without a table to insert into. So there is no direct solution to your question, but there is a number of possible alternative routes, depending on exact requirements.
您可以从系统目录中获取那些pg_attrdef
如@Patrick 评论 或来自information_schema.columns
.在此处完成说明:
You could fetch those from the system catalog pg_attrdef
like @Patrick commented or from information_schema.columns
. Complete instructions here:
但是您仍然只有一个行列表,其中包含表达式的文本表示来烹饪默认值.您必须动态地构建和执行语句以获取要使用的值.乏味而凌乱.相反,我们可以让内置的 Postgres 功能为我们做这些:
But then you still only have a list of rows with a text representation of the expression to cook the default value. You would have to build and execute statements dynamically to get values to work with. Tedious and messy. Instead, we can let built-in Postgres functionality do that for us:
插入一个虚拟行并让它返回以使用生成的默认值:
Insert a dummy row and have it returned to use generated defaults:
INSERT INTO playlist_items DEFAULT VALUES RETURNING *;
问题/解决方案的范围
- 这仅保证适用于
STABLE
或IMMUTABLE
默认表达式.大多数VOLATILE
函数也能正常工作,但不能保证.current_timestamp
系列函数被认为是稳定的,因为它们的值在事务中不会改变.
特别是,这对serial
列(或从序列中绘制的任何其他默认值)有副作用.但这应该不是问题,因为您通常不会直接写入serial
列.那些根本不应该在INSERT
语句中列出.serial
列的剩余缺陷:序列仍然通过单个调用前进以获得默认行,从而在编号中产生间隙.同样,这应该不是问题,因为在serial
列中通常会出现间隙. - This is only guaranteed to work for
STABLE
orIMMUTABLE
default expressions. MostVOLATILE
functions will work just as well, but there are no guarantees. Thecurrent_timestamp
family of functions qualify as stable, since their values do not change within a transaction.
In particular, this has side effects onserial
columns (or any other defaults drawing from a sequence). But that should not be a problem, because you don't normally write toserial
columns directly. Those shouldn't be listed inINSERT
statements at all.
Remaining flaw forserial
columns: the sequence is still advanced by the single call to get a default row, producing a gap in the numbering. Again, that should not be a problem, because gaps are generally to be expected inserial
columns. 如果您定义了列
NOT NULL
,则必须插入虚拟值并在结果中替换为NULL
.
Problems / scope of the solution
还有两个问题可以解决:
Two more problems can be solved:
我们实际上不想插入虚拟行.我们可以稍后删除(在同一事务中),但这可能会产生更多副作用,例如触发 ON DELETE
.有一个更好的方法:
We do not actually want to insert the dummy row. We could delete later (in the same transaction), but that may have more side effects, like triggers ON DELETE
. There is a better way:
克隆一个临时表,包括默认列并插入那个:
Clone a temporary table including column defaults and insert into that:
BEGIN;
CREATE TEMP TABLE tmp_playlist_items (LIKE playlist_items INCLUDING DEFAULTS)
ON COMMIT DROP; -- drop at end of transaction
INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *;
...
结果相同,副作用更少.由于默认表达式是逐字复制的,如果有的话,克隆会从相同的序列中提取.但是完全避免了不需要的行或触发器的其他副作用.
Same result, fewer side effects. Since default expressions are copied verbatim, the clone draws from the same sequences if any. But other side effects from the unwanted row or triggers are avoided completely.
感谢 Igor 提出的想法:
Credit to Igor for the idea:
您必须为 NOT NULL
列提供虚拟值,因为 (每个文档):
You would have to provide dummy values for NOT NULL
columns, because (per documentation):
非空约束总是被复制到新表中.
Not-null constraints are always copied to the new table.
要么适应 INSERT
语句中的那些,要么(更好地)消除约束:
Either accommodate for those in the INSERT
statement or (better) eliminate the constraints:
ALTER TABLE tmp_playlist_items
ALTER COLUMN foo DROP NOT NULL
, ALTER COLUMN bar DROP NOT NULL;
有一种具有超级用户权限的快速而肮脏的方法:
There is a quick and dirty way with superuser privileges:
UPDATE pg_attribute
SET attnotnull = FALSE
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0;
它只是一个没有数据也没有其他用途的临时表,在事务结束时被删除.所以捷径很诱人.不过,基本规则是:永远不要直接篡改系统目录.
It is just a temporary table with no data and no other purpose, and it's dropped at the end of the transaction. So the shortcut is tempting. Still, the basic rule is: never tamper with system catalogs directly.
那么,让我们研究一种干净的方式:在 DO
语句中使用动态 SQL 实现自动化.您只需要常规权限,因为同一个角色创建了临时表.
So, let's look into a clean way:
Automate with dynamic SQL in a DO
statement. You just need the regular privileges you are guaranteed to have since the same role created the temp table.
DO $$BEGIN
EXECUTE (
SELECT 'ALTER TABLE tmp_playlist_items ALTER '
|| string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
|| ' DROP NOT NULL'
FROM pg_catalog.pg_attribute
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0
);
END$$
更干净,但仍然非常快.小心执行动态命令并警惕 SQL 注入.这个说法是安全的.我已经发布了几个相关答案以及更多解释.
Much cleaner and still very fast. Execute care with dynamic commands and be wary of SQL injection. This statement is safe. I have posted several related answers with more explanation.
BEGIN;
CREATE TEMP TABLE tmp_playlist_items
(LIKE playlist_items INCLUDING DEFAULTS) ON COMMIT DROP;
DO $$BEGIN
EXECUTE (
SELECT 'ALTER TABLE tmp_playlist_items ALTER '
|| string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
|| ' DROP NOT NULL'
FROM pg_catalog.pg_attribute
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0
);
END$$;
LOCK TABLE playlist_items IN EXCLUSIVE MODE; -- forbid concurrent writes
WITH default_row AS (
INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *
)
, new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
VALUES
(651, 21, 30012, 'a', 30, 1, FALSE)
, (NULL, 21, 1, 'b', 34, 2, NULL)
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
)
, upsert AS ( -- *not* replacing existing values in UPDATE (?)
UPDATE playlist_items m
SET ( playlist, item, group_name, duration, sort, legacy)
= (n.playlist, n.item, n.group_name, n.duration, n.sort, n.legacy)
-- ..., COALESCE(n.legacy, m.legacy) -- see below
FROM new_values n
WHERE n.id = m.id
RETURNING m.id
)
INSERT INTO playlist_items
(playlist, item, group_name, duration, sort, legacy)
SELECT n.playlist, n.item, n.group_name, n.duration, n.sort
, COALESCE(n.legacy, d.legacy)
FROM new_values n, default_row d -- single row can be cross-joined
WHERE NOT EXISTS (SELECT 1 FROM upsert u WHERE u.id = n.id)
RETURNING id;
COMMIT;
如果您有并发事务尝试写入同一个表,您只需要 LOCK
.
You only need the LOCK
if you have concurrent transactions trying to write to the same table.
根据要求,这仅替换 INSERT
案例的输入行中 legacy
列中的 NULL 值.可以很容易地扩展到其他列或在 UPDATE
情况下工作.例如,您也可以有条件地UPDATE
:仅当输入值为NOT NULL
时.我在上面的 UPDATE
中添加了注释行.
As requested, this only replaces NULL values in the column legacy
in the input rows for the INSERT
case. Can easily be extended to work for other columns or in the UPDATE
case as well. For instance, you could UPDATE
conditionally as well: only if the input value is NOT NULL
. I added a commented line to the UPDATE
above.
旁白:除了 VALUES
表达式中的第一个之外,您不需要在任何行中转换值,因为类型是从 first 派生的> 行.
Aside: You do not need to cast values in any row but the first in a VALUES
expression, since types are derived from the first row.
使用INSERT .. ON CONFLICT .. DO NOTHING 实现UPSERT |更新
.这在很大程度上简化了操作:
implements UPSERT with INSERT .. ON CONFLICT .. DO NOTHING | UPDATE
. This largely simplifies the operation:
INSERT INTO playlist_items AS m (id, playlist, item, group_name, duration, sort, legacy)
VALUES (651, 21, 30012, 'a', 30, 1, FALSE)
, (DEFAULT, 21, 1, 'b', 34, 2, DEFAULT) -- !
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
ON CONFLICT (id) DO UPDATE
SET (playlist, item, group_name, duration, sort, legacy)
= (EXCLUDED.playlist, EXCLUDED.item, EXCLUDED.group_name
, EXCLUDED.duration, EXCLUDED.sort, EXCLUDED.legacy)
-- (..., COALESCE(l.legacy, EXCLUDED.legacy)) -- see below
RETURNING m.id;
我们可以将 VALUES
子句直接附加到 INSERT
上,这允许 DEFAULT
关键字.在 (id)
上出现唯一违规的情况下,Postgres 会改为更新.我们可以在 UPDATE
中使用排除的行.手册:
We can attach the VALUES
clause to INSERT
directly, which allows the DEFAULT
keyword. In the case of unique violations on (id)
, Postgres updates instead. We can use excluded rows in the UPDATE
. The manual:
ON CONFLICT DO UPDATE
中的 SET
和 WHERE
子句可以访问使用表名(或别名)的现有行,以及行建议使用特殊的 excluded
表插入.
The
SET
andWHERE
clauses inON CONFLICT DO UPDATE
have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the specialexcluded
table.
还有:
注意所有每行 BEFORE INSERT
触发器的效果是反映在排除值中,因为这些影响可能有贡献到被排除在插入之外的行.
Note that the effects of all per-row
BEFORE INSERT
triggers are reflected in excluded values, since those effects may have contributed to the row being excluded from insertion.
剩余的角落案例
UPDATE
您有多种选择:您可以......
Remaining corner case
You have various options for the UPDATE
: You can ...
- ...根本不更新:在
UPDATE
中添加一个WHERE
子句以仅写入选定的行. - ...只更新选定的列.
- ...仅在列当前为 NULL 时更新:
COALESCE(l.legacy, EXCLUDED.legacy)
- ...仅当新值是
NOT NULL
时才更新:COALESCE(EXCLUDED.legacy, l.legacy)
- ... not update at all: add a
WHERE
clause to theUPDATE
to only write to selected rows. - ... only update selected columns.
- ... only update if the column is currently NULL:
COALESCE(l.legacy, EXCLUDED.legacy)
- ... only update if the new value is
NOT NULL
:COALESCE(EXCLUDED.legacy, l.legacy)
但是没有办法辨别DEFAULT
值和INSERT
中实际提供的值.只有结果 EXCLUDED
行是可见的.如果您需要区别,请返回到之前的解决方案,我们可以为您提供这两种解决方案.
But there is no way to discern DEFAULT
values and values actually provided in the INSERT
. Only resulting EXCLUDED
rows are visible. If you need the distinction, fall back to the previous solution, where you have both at our disposal.
这篇关于使用 PostgreSQL 9.3 在 CTE UPSERT 中生成 DEFAULT 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!