如何在INSERT ... ON CONFLICT中包括RETURNING中的排除行 [英] How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
问题描述
我有这个表(由Django生成):
CREATE TABLE feeds_person(
id serial PRIMARY KEY,
创建带时区的时间戳NOT NULL,
修改的时区与时区NOT NULL,
名称字符变化(4000)NOT NULL,
url字符变化(1000)NOT NULL,
电子邮件字符变化(254)NOT NULL,
CONSTRAINT feeds_person_name_ad8c7469_uniq UNIQUE(name,url,email)
);
我试图使用 INSERT
与 ON CONFLICT
子句。
皱纹是我需要得到 id
返回所有行,无论它们是否已经存在。
在其他情况下,我会执行以下操作:
INSERT INTO feeds_person(created,modified,name,url,email)
VALUES blah blah blah
ON CONFLICT(name,url,email)DO UPDATE SET url = feeds_person.url
RETURNING id
执行 UPDATE
会导致该语句返回该元素的 id
行。除此之外,它不适用于此表。我认为它不起作用,因为我有多个字段是唯一的,而在其他情况下,我使用这种方法我只有一个唯一的字段。
当尝试通过Django的游标运行SQL时,我收到此错误:
django.db.utils.ProgrammingError:ON CONFLICT DO UPDATE命令不能再次影响行
提示:确保在同一命令中提出插入的行没有重复约束值。
如何使用此表进行批量插入并返回插入和存在的ids?
您收到的错误:
ON CONFLICT DO UPDATE命令不能再次影响行
表示您正在尝试同意在一个命令中多次排列。换句话说:您在 VALUES
列表中的(名称,URL,电子邮件)
上有重复。折叠重复(如果这是一个选项),它应该工作。但是,你必须决定从每一组复制中挑选哪一行。
INSERT INTO feeds_person(created,modified,name, url,email)
SELECT DISTINCT ON(name,url,email)*
FROM(VALUES blah blah blah)v(创建,修改,名称,URL,电子邮件) - 匹配列列表
ON CONFLICT(name,url,email)DO UPDATE
SET url = feeds_person.url
RETURNING id;
由于我们使用独立的 VALUES
现在,您必须为非默认类型添加明确的类型转换。喜欢:
VALUES
(timestamptz'07-03-12 02:47:56 + 01'
,timamptz'2016-03-12 02:47:56 + 01'
,'n3','u3','e3')
...
您的 timestamptz
列需要显式类型转换,而字符串类型可以默认运行文本
。 (您还可以立即将其转换为 varchar(n)
)
有哪些方法可以确定哪一行从每一组复制中挑选:
你是对的,目前没有办法在 RETURNING
子句中获取 排除 行。我引用 Postgres Wiki :
请注意,
RETURNING
不会显示 EXCLUDED。*
别名从
更新
(只是通用的目标。*
别名可见在那里)。这样做就被认为是为
简单的常见情况造成烦人的歧义 [30] 几乎没有好处。在未来某些
点,如果
RETURNING
-projected元组被插入和更新,我们可能会采取一种暴露方式,但是这个可能不需要使它成为
的第一个承诺迭代功能 [31] 。
,您不应该更新不应更新的行。空的更新几乎与常规更新一样昂贵 - 并且可能会产生意想不到的副作用。您不必严格要求UPSERT开始,您的案例看起来更像SELECT或INSERT。相关:
一个更简单的插入一行行的方法是使用数据修改CTE:
WITH val AS(
SELECT DISTINCT ON(name,url,email)*
FROM(
VALUES
(timestamptz'2016-1-1 0:0 +1',timestamptz'2016-1-1 0:0 + 1','n','u','e')
,('2016-03-12 02:47:56 + 01' ,$ 3
- 更多(第一行只需要输入类型)
)v (创建,修改,名称,URL,电子邮件)
)
,ins AS(
INSERT INTO feeds_person(created,modified,name,url,email)
SELECT创建,修改,name,url,email FROM val
ON CONFLICT(name,url,email)DO NOTHING
RETURNING id,name,url,email
)
SELECT'inserted'AS how,id FROM ins - inserted
UNION ALL
SELECT'selected'AS how,f .id - 未插入
FROM val v
JOIN feeds_person f USING(name,url,email);
/ * - CONFLICT(name,url,email)的冗余
WHERE NOT EXISTS(
SELECT 1 FROM ins
WHERE name = v.name
AND url = v.url
AND email = v.email
);
* /
最初,我添加了一个 NOT EXITS
谓词,以防止重复的结果。第二个想法,那是多余的。 单个查询的所有CTE都会查看相同的表格快照。 返回的 ON CONFLICT(name,url,email) code>与相同列上
INNER JOIN
之后返回的集合是互斥的。
是忽略行的顺序,因为它没有很好的定义,而你还没有定义如何完全折叠重复。你可以有任何你想要的订单...
复杂性应该支付大桌子,其中 INSERT
是规则和 SELECT
的例外。
其他你可能只是 INSERT .. ON CONFLICT DO NOTHING
,后跟一个 SELECT
为所有行 - 在同一个事务中。如果并发事务提交对 INSERT
和的表的写入,则会为比赛条件 留下 code> SELECT (默认情况下为 READ COMMITTED
隔离级别)。
要关闭此窗口,请使用 DO UPDATE
在排除的行,但添加一个条件到更新,实际上不更新任何不需要更新的行:
ON CONFLICT(tag)DO UPDATE
SET name = name WHERE FALSE - 从未执行,只锁定行
只有这个表达式返回的行
true
将被更新,
虽然所有r当ON CONFLICT DO UPDATE
action
时,ows将被锁定。请注意,条件是最后评估,冲突之后
被确定为更新的候选人。
粗体强调我的。
I've got this table (generated by Django):
CREATE TABLE feeds_person (
id serial PRIMARY KEY,
created timestamp with time zone NOT NULL,
modified timestamp with time zone NOT NULL,
name character varying(4000) NOT NULL,
url character varying(1000) NOT NULL,
email character varying(254) NOT NULL,
CONSTRAINT feeds_person_name_ad8c7469_uniq UNIQUE (name, url, email)
);
I'm trying to bulk insert a lot of data using INSERT
with an ON CONFLICT
clause.
The wrinkle is that I need to get the id
back for all of the rows, whether they're already existing or not.
In other cases, I would do something like:
INSERT INTO feeds_person (created, modified, name, url, email)
VALUES blah blah blah
ON CONFLICT (name, url, email) DO UPDATE SET url = feeds_person.url
RETURNING id
Doing the UPDATE
causes the statement to return the id
of that row. Except, it doesn't work with this table. I think it doesn't work because I've got multiple fields unique together whereas in other instances I've used this method I've had just one unique field.
I get this error when trying to run the SQL through Django's cursor:
django.db.utils.ProgrammingError: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
How do I do the bulk insert with this table and get back the inserted and existing ids?
The error you get:
ON CONFLICT DO UPDATE command cannot affect row a second time
indicates you are trying to upsert the same row more than once in a single command. In other words: you have dupes on (name, url, email)
in your VALUES
list. Fold duplicates (if that's an option) and it should work. But you will have to decide which row to pick from each set of dupes.
INSERT INTO feeds_person (created, modified, name, url, email)
SELECT DISTINCT ON (name, url, email) *
FROM (VALUES blah blah blah) v(created, modified, name, url, email) -- match column list
ON CONFLICT (name, url, email) DO UPDATE
SET url = feeds_person.url
RETURNING id;
Since we use a free-standing VALUES
expression now, you have to add explicit type casts for non-default types. Like:
VALUES
(timestamptz '2016-03-12 02:47:56+01'
, timestamptz '2016-03-12 02:47:56+01'
, 'n3', 'u3', 'e3')
...
Your timestamptz
columns need an explicit type cast, while the string types can operate with default text
. (You could still cast to varchar(n)
right away.)
There are ways to determine which row to pick from each set of dupes:
You are right, there is (currently) no way to get excluded rows in the RETURNING
clause. I quote the Postgres Wiki:
Note that
RETURNING
does not make visible the "EXCLUDED.*
" alias from theUPDATE
(just the generic "TARGET.*
" alias is visible there). Doing so is thought to create annoying ambiguity for the simple, common cases [30] for little to no benefit. At some point in the future, we may pursue a way of exposing ifRETURNING
-projected tuples were inserted and updated, but this probably doesn't need to make it into the first committed iteration of the feature [31].
However, you shouldn't be updating rows that are not supposed to be updated. Empty updates are almost as expensive as regular updates - and might have unintended side effects. You don't strictly need UPSERT to begin with, your case looks more like "SELECT or INSERT". Related:
One cleaner way to insert a set of rows would be with data-modifying CTEs:
WITH val AS (
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
(timestamptz '2016-1-1 0:0+1', timestamptz '2016-1-1 0:0+1', 'n', 'u', 'e')
, ('2016-03-12 02:47:56+01', '2016-03-12 02:47:56+01', 'n1', 'u3', 'e3')
-- more (type cast only needed in 1st row)
) v(created, modified, name, url, email)
)
, ins AS (
INSERT INTO feeds_person (created, modified, name, url, email)
SELECT created, modified, name, url, email FROM val
ON CONFLICT (name, url, email) DO NOTHING
RETURNING id, name, url, email
)
SELECT 'inserted' AS how, id FROM ins -- inserted
UNION ALL
SELECT 'selected' AS how, f.id -- not inserted
FROM val v
JOIN feeds_person f USING (name, url, email);
/* -- redundant for CONFLICT (name, url, email)
WHERE NOT EXISTS (
SELECT 1 FROM ins
WHERE name = v.name
AND url = v.url
AND email = v.email
);
*/
Originally, I had added a NOT EXITS
predicate to prevent duplicates in the result. On second thought, that was redundant. All CTE of a single query see the same snapshots of tables. The set returned with ON CONFLICT (name, url, email) DO NOTHING
is mutually exclusive to the set returned after the INNER JOIN
on the same columns.
This is ignoring the order of rows, since it's not well defined while you still did not define how to fold duplicates exactly. You can have any order you want ...
The complexity should pay for big tables where INSERT
is the rule and SELECT
the exception.
Else you might just INSERT .. ON CONFLICT DO NOTHING
, followed by a SELECT
for all rows - within the same transaction. It would leave a tiny window for a race condition if a concurrent transaction commits writes to the table between INSERT
and SELECT
(in default READ COMMITTED
isolation level).
To close this window for good, use DO UPDATE
on excluded rows, but add a condition to the update to not actually update any rows that don't need an update:
ON CONFLICT(tag) DO UPDATE
SET name = name WHERE FALSE -- never executed, only locks rows
Because (the manual for Postgres 9.5):
Only rows for which this expression returns
true
will be updated, although all rows will be locked when theON CONFLICT DO UPDATE
action is taken. Note that condition is evaluated last, after a conflict has been identified as a candidate to update.
Bold emphasis mine.
这篇关于如何在INSERT ... ON CONFLICT中包括RETURNING中的排除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!