如何在INSERT ... ON CONFLICT中包括RETURNING中的排除行 [英] How to include excluded rows in RETURNING from INSERT ... ON CONFLICT

查看:229
本文介绍了如何在INSERT ... ON CONFLICT中包括RETURNING中的排除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个表(由Django生成):

  CREATE TABLE feeds_person(
id serial PRIMARY KEY,
创建带时区的时间戳NOT NULL,
修改的时区与时区NOT NULL,
名称字符变化(4000)NOT NULL,
url字符变化(1000)NOT NULL,
电子邮件字符变化(254)NOT NULL,
CONSTRAINT feeds_person_name_ad8c7469_uniq UNIQUE(name,url,email)
);

我试图使用 INSERT ON CONFLICT 子句。



皱纹是我需要得到 id 返回所有行,无论它们是否已经存在。



在其他情况下,我会执行以下操作:

  INSERT INTO feeds_person(created,modified,name,url,email)
VALUES blah blah blah
ON CONFLICT(name,url,email)DO UPDATE SET url = feeds_person.url
RETURNING id

执行 UPDATE 会导致该语句返回该元素的 id 行。除此之外,它不适用于此表。我认为它不起作用,因为我有多个字段是唯一的,而在其他情况下,我使用这种方法我只有一个唯一的字段。



当尝试通过Django的游标运行SQL时,我收到此错误:


  django.db.utils.ProgrammingError:ON CONFLICT DO UPDATE命令不能再次影响行
提示:确保在同一命令中提出插入的行没有重复约束值。


如何使用此表进行批量插入并返回插入和存在的ids?

解决方案

您收到的错误:


ON CONFLICT DO UPDATE命令不能再次影响行


表示您正在尝试同意在一个命令中多次排列。换句话说:您在 VALUES 列表中的(名称,URL,电子邮件)上有重复。折叠重复(如果这是一个选项),它应该工作。但是,你必须决定从每一组复制中挑选哪一行。

  INSERT INTO feeds_person(created,modified,name, url,email)
SELECT DISTINCT ON(name,url,email)*
FROM(VALUES blah blah blah)v(创建,修改,名称,URL,电子邮件) - 匹配列列表
ON CONFLICT(name,url,email)DO UPDATE
SET url = feeds_person.url
RETURNING id;

由于我们使用独立的 VALUES 现在,您必须为非默认类型添加明确的类型转换。喜欢:

  VALUES 
(timestamptz'07-03-12 02:47:56 + 01'
,timamptz'2016-03-12 02:47:56 + 01'
,'n3','u3','e3')
...

您的 timestamptz 列需要显式类型转换,而字符串类型可以默认运行文本。 (您还可以立即将其转换为 varchar(n)



有哪些方法可以确定哪一行从每一组复制中挑选:





你是对的,目前没有办法在 RETURNING 子句中获取 排除 行。我引用 Postgres Wiki



请注意, RETURNING 不会显示 EXCLUDED。* 别名
更新(只是通用的目标。* 别名可见
在那里)。这样做就被认为是为
简单的常见情况造成烦人的歧义 [30] 几乎没有好处。在未来某些
点,如果
RETURNING -projected元组被插入和更新,我们可能会采取一种暴露方式,但是这个
可能不需要使它成为
的第一个承诺迭代功能 [31]


,您不应该更新不应更新的行。空的更新几乎与常规更新一样昂贵 - 并且可能会产生意想不到的副作用。您不必严格要求UPSERT开始,您的案例看起来更像SELECT或INSERT。相关:





一个更简单的插入一行行的方法是使用数据修改CTE:

  WITH val AS(
SELECT DISTINCT ON(name,url,email)*
FROM(
VALUES
(timestamptz'2016-1-1 0:0 +1',timestamptz'2016-1-1 0:0 + 1','n','u','e')
,('2016-03-12 02:47:56 + 01' ,$ 3
- 更多(第一行只需要输入类型)
)v (创建,修改,名称,URL,电子邮件)

,ins AS(
INSERT INTO feeds_person(created,modified,name,url,email)
SELECT创建,修改,name,url,email FROM val
ON CONFLICT(name,url,email)DO NOTHING
RETURNING id,name,url,email

SELECT'inserted'AS how,id FROM ins - inserted
UNION ALL
SELECT'selected'AS how,f .id - 未插入
FROM val v
JOIN feeds_person f USING(name,url,email);

/ * - CONFLICT(name,url,email)的冗余
WHERE NOT EXISTS(
SELECT 1 FROM ins
WHERE name = v.name
AND url = v.url
AND email = v.email
);
* /

最初,我添加了一个 NOT EXITS 谓词,以防止重复的结果。第二个想法,那是多余的。 单个查询的所有CTE都会查看相同的表格快照。 返回的 ON CONFLICT(name,url,email) code>与相同列上 INNER JOIN 之后返回的集合是互斥的。



是忽略行的顺序,因为它没有很好的定义,而你还没有定义如何完全折叠重复。你可以有任何你想要的订单...



复杂性应该支付大桌子,其中 INSERT 是规则和 SELECT 的例外。

其他你可能只是 INSERT .. ON CONFLICT DO NOTHING ,后跟一个 SELECT 为所有行 - 在同一个事务中。如果并发事务提交对 INSERT 和的表的写入,则会为比赛条件
留下 code> SELECT
(默认情况下为 READ COMMITTED 隔离级别)。



要关闭此窗口,请使用 DO UPDATE 在排除的行,但添加一个条件到更新,实际上不更新任何不需要更新的行:

  ON CONFLICT(tag)DO UPDATE 
SET name = name WHERE FALSE - 从未执行,只锁定行


只有这个表达式返回的行 true 将被更新,
虽然所有r当 ON CONFLICT DO UPDATE action
时,ows将被锁定。请注意,条件是最后评估,冲突之后
被确定为更新的候选人。


粗体强调我的。


I've got this table (generated by Django):

CREATE TABLE feeds_person (
  id serial PRIMARY KEY,
  created timestamp with time zone NOT NULL,
  modified timestamp with time zone NOT NULL,
  name character varying(4000) NOT NULL,
  url character varying(1000) NOT NULL,
  email character varying(254) NOT NULL,
  CONSTRAINT feeds_person_name_ad8c7469_uniq UNIQUE (name, url, email)
);

I'm trying to bulk insert a lot of data using INSERT with an ON CONFLICT clause.

The wrinkle is that I need to get the id back for all of the rows, whether they're already existing or not.

In other cases, I would do something like:

INSERT INTO feeds_person (created, modified, name, url, email)
VALUES blah blah blah
ON CONFLICT (name, url, email) DO UPDATE SET url = feeds_person.url
RETURNING id

Doing the UPDATE causes the statement to return the id of that row. Except, it doesn't work with this table. I think it doesn't work because I've got multiple fields unique together whereas in other instances I've used this method I've had just one unique field.

I get this error when trying to run the SQL through Django's cursor:

django.db.utils.ProgrammingError: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

How do I do the bulk insert with this table and get back the inserted and existing ids?

解决方案

The error you get:

ON CONFLICT DO UPDATE command cannot affect row a second time

indicates you are trying to upsert the same row more than once in a single command. In other words: you have dupes on (name, url, email) in your VALUES list. Fold duplicates (if that's an option) and it should work. But you will have to decide which row to pick from each set of dupes.

INSERT INTO feeds_person (created, modified, name, url, email)
SELECT DISTINCT ON (name, url, email) *
FROM  (VALUES blah blah blah) v(created, modified, name, url, email)  -- match column list
ON     CONFLICT (name, url, email) DO UPDATE
SET    url = feeds_person.url
RETURNING id;

Since we use a free-standing VALUES expression now, you have to add explicit type casts for non-default types. Like:

VALUES
    (timestamptz '2016-03-12 02:47:56+01'
   , timestamptz '2016-03-12 02:47:56+01'
   , 'n3', 'u3', 'e3')
   ...

Your timestamptz columns need an explicit type cast, while the string types can operate with default text. (You could still cast to varchar(n) right away.)

There are ways to determine which row to pick from each set of dupes:

You are right, there is (currently) no way to get excluded rows in the RETURNING clause. I quote the Postgres Wiki:

Note that RETURNING does not make visible the "EXCLUDED.*" alias from the UPDATE (just the generic "TARGET.*" alias is visible there). Doing so is thought to create annoying ambiguity for the simple, common cases [30] for little to no benefit. At some point in the future, we may pursue a way of exposing if RETURNING-projected tuples were inserted and updated, but this probably doesn't need to make it into the first committed iteration of the feature [31].

However, you shouldn't be updating rows that are not supposed to be updated. Empty updates are almost as expensive as regular updates - and might have unintended side effects. You don't strictly need UPSERT to begin with, your case looks more like "SELECT or INSERT". Related:

One cleaner way to insert a set of rows would be with data-modifying CTEs:

WITH val AS (
   SELECT DISTINCT ON (name, url, email) *
   FROM  (
      VALUES 
      (timestamptz '2016-1-1 0:0+1', timestamptz '2016-1-1 0:0+1', 'n', 'u', 'e')
    , ('2016-03-12 02:47:56+01', '2016-03-12 02:47:56+01', 'n1', 'u3', 'e3')
      -- more (type cast only needed in 1st row)
      ) v(created, modified, name, url, email)
   )
, ins AS (
   INSERT INTO feeds_person (created, modified, name, url, email)
   SELECT created, modified, name, url, email FROM val
   ON     CONFLICT (name, url, email) DO NOTHING
   RETURNING id, name, url, email
   )
SELECT 'inserted' AS how, id FROM ins  -- inserted
UNION  ALL
SELECT 'selected' AS how, f.id         -- not inserted
FROM   val v
JOIN   feeds_person f USING (name, url, email);

/*  -- redundant for CONFLICT (name, url, email)
WHERE  NOT EXISTS (
   SELECT 1 FROM ins
   WHERE  name  = v.name
   AND    url   = v.url
   AND    email = v.email
   );
*/

Originally, I had added a NOT EXITS predicate to prevent duplicates in the result. On second thought, that was redundant. All CTE of a single query see the same snapshots of tables. The set returned with ON CONFLICT (name, url, email) DO NOTHING is mutually exclusive to the set returned after the INNER JOIN on the same columns.

This is ignoring the order of rows, since it's not well defined while you still did not define how to fold duplicates exactly. You can have any order you want ...

The complexity should pay for big tables where INSERT is the rule and SELECT the exception.
Else you might just INSERT .. ON CONFLICT DO NOTHING, followed by a SELECT for all rows - within the same transaction. It would leave a tiny window for a race condition if a concurrent transaction commits writes to the table between INSERT and SELECT (in default READ COMMITTED isolation level).

To close this window for good, use DO UPDATE on excluded rows, but add a condition to the update to not actually update any rows that don't need an update:

ON CONFLICT(tag) DO UPDATE
SET name = name WHERE FALSE  -- never executed, only locks rows

Because (the manual for Postgres 9.5):

Only rows for which this expression returns true will be updated, although all rows will be locked when the ON CONFLICT DO UPDATE action is taken. Note that condition is evaluated last, after a conflict has been identified as a candidate to update.

Bold emphasis mine.

这篇关于如何在INSERT ... ON CONFLICT中包括RETURNING中的排除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆