忽略并发插入中的错误 [英] Ignoring errors in concurrent insertions

查看：143 发布时间：2016/12/26 21:02:27 sql postgresql concurrency duplicate-removal sql-insert

本文介绍了忽略并发插入中的错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个字符串向量 data 包含要插入到名为 foos 的表中的项目。有可能 data 中的某些元素已经存在于表中，因此我必须注意这些元素。

我使用的解决方案是通过将数据向量转换为虚拟表 old_and_new 它然后构建包含已经存在于 foos 中的元素的虚拟表 old ;然后，它构造虚拟表新与元素
这是真的新。最后，它在表 foos 中插入新的元素。

  WITH old_and_new AS（SELECT notest（$ data :: text []）AS foo），
 old AS（SELECT foo FROM foos INNER JOIN old_and_new USING（foo）），
 new AS（SELECT * FROM old_and_new EXCEPT SELECT * FROM old）
 INSERT INTO foos（foo）SELECT foo FROM new

在非并发设置中，但如果并发线程
尝试在同一时间插入相同的新元素，则失败。我知道我可以通过将隔离级别设置为 serializable 来解决这个
，但这是非常繁琐的。

有没有其他方法我可以解决这个问题？如果只有一个方法
告诉PostgreSQL它是安全的忽略 INSERT 错误...

解决方案

无论您的行动方案是什么（ @Denis ）选项），此重写的 INSERT 命令将更快速：

  INSERT INTO foos（foo）
 SELECT n.foo 
 FROM unnest（$ data :: text []）AS n（foo）
 LEFT JOIN foos o USING foo）
 WHERE o.foo IS NULL

/ em>一个可能的竞争条件的较小的时间框架。

事实上，时间框架应该这么小，唯一的违规应该只在大量的并发负载或者巨大的数组下弹出。

数组中的复制？

除非你的问题是内置。你在输入数组本身有重复吗？在这种情况下，事务隔离不会帮助你。

  
 INSERT INTO foos（foo）
 SELECT n.foo 
 FROM（SELECT DISTINCT foo FROM unnest（'{foo，bar，foo，baz}':: text []）AS foo）n 
 LEFT JOIN foos o USING（foo）
 WHERE o.foo IS NULL

DISTINCT 在子查询中消除sleeper agents，又名重复。

人们往往忘记

完全自动化

此功能是一个方式来处理并发的好。如果发生 UNIQUE_VIOLATION ，则会重试 INSERT 。

它不可以处理相反的问题，一行可能已经被删除了同时删除 - 这不会被重新插入。有人可能会说，这个结果是确定的，因为这样的 DELETE 同时发生。如果你想防止这种情况，请使用 SELECT ... FOR SHARE 来保护并行的行 DELETE 。 / p>

  CREATE OR REPLACE FUNCTION f_insert_array（_data text []，OUT ins_ct int）AS 
 $ func $ 
 BEGIN 
 
 LOOP 
 BEGIN 
 
 INSERT INTO foos（foo）
 SELECT n.foo 
 FROM（SELECT DISTINCT foo FROM unnest（_data ）AS foo）n 
 LEFT JOIN foos o USING（foo）
 WHERE o.foo IS NULL; 
 
 GET诊断ins_ct = ROW_COUNT; 
 RETURN; 
 
 EXCEPTION WHEN UNIQUE_VIOLATION THEN  -  tag.tag有UNIQUE约束。 
 RAISE注意'它实际上发生了！ - 几乎没有发生
 END; 
 END LOOP; 
 
 END 
 $ func $ 
 LANGUAGE plpgsql;

我让函数返回插入行的计数，这是完全可选的。

- > SQLfiddle演示

I have a string vector data containing items that I want to insert into a table named foos. It's possible that some of the elements in data already exist in the table, so I must watch out for those.

The solution I'm using starts by transforming the data vector into virtual table old_and_new; it then builds virtual table old which contains the elements which are already present in foos; then, it constructs virtual table new with the elements which are really new. Finally, it inserts the new elements in table foos.

WITH   old_and_new AS (SELECT unnest ($data :: text[]) AS foo),
       old AS (SELECT foo FROM foos INNER JOIN old_and_new USING (foo)),
       new AS (SELECT * FROM old_and_new EXCEPT SELECT * FROM old)
INSERT INTO foos (foo) SELECT foo FROM new

This works fine in a non-concurrent setting, but fails if concurrent threads try to insert the same new element at the same time. I know I can solve this by setting the isolation level to serializable, but that's very heavy-handed.

Is there some other way I can solve this problem? If only there was a way to tell PostgreSQL that it was safe to ignore INSERT errors...

解决方案

Whatever your course of action is (@Denis gave you quite a few options), this rewritten INSERT command will be much faster:

INSERT INTO foos (foo)
SELECT n.foo
FROM   unnest ($data::text[]) AS n(foo)
LEFT   JOIN foos o USING (foo)
WHERE  o.foo IS NULL

It also leaves a much smaller time frame for a possible race condition.
In fact, the time frame should be so small, that unique violations should only be popping up under heavy concurrent load or with huge arrays.

Dupes in the array?

Except, if you your problem is built-in. Do you have duplicates in the input array itself? In this case, transaction isolation is not going to help you. The enemy is within!

Consider this example / solution:

INSERT INTO foos (foo)
SELECT n.foo
FROM  (SELECT DISTINCT foo FROM unnest('{foo,bar,foo,baz}'::text[]) AS foo) n
LEFT   JOIN foos o USING (foo)
WHERE  o.foo IS NULL

I use DISTINCT in the subquery to eliminate the "sleeper agents", a.k.a. duplicates.

People tend to forget that the dupes may come within the import data.

Full automation

This function is one way to deal with concurrency for good. If a UNIQUE_VIOLATION occurs, the INSERT is just retried. The newly present rows are excluded from the new attempt automatically.

It does not take care of the opposite problem, that a row might have been deleted concurrently - this would not get re-inserted. One might argue, that this outcome is ok, since such a DELETE happened concurrently. If you want to prevent this, make use of SELECT ... FOR SHARE to protect rows from concurrent DELETE.

CREATE OR REPLACE FUNCTION f_insert_array(_data text[], OUT ins_ct int) AS
$func$
BEGIN

LOOP
   BEGIN

   INSERT INTO foos (foo)
   SELECT n.foo
   FROM  (SELECT DISTINCT foo FROM unnest(_data) AS foo) n
   LEFT   JOIN foos o USING (foo)
   WHERE  o.foo IS NULL;

   GET DIAGNOSTICS ins_ct = ROW_COUNT;
   RETURN;

   EXCEPTION WHEN UNIQUE_VIOLATION THEN     -- tag.tag has UNIQUE constraint.
      RAISE NOTICE 'It actually happened!'; -- hardly ever happens
   END;
END LOOP;

END
$func$
  LANGUAGE plpgsql;

I made the function return the count of inserted rows, which is completely optional.

-> SQLfiddle demo

这篇关于忽略并发插入中的错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

忽略并发插入中的错误 [英] Ignoring errors in concurrent insertions

问题描述

数组中的复制？

完全自动化

Dupes in the array?

Full automation

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

忽略并发插入中的错误 [英] Ignoring errors in concurrent insertions

问题描述

数组中的复制？

完全自动化

Dupes in the array?

Full automation

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭