忽略并发插入中的错误 [英] Ignoring errors in concurrent insertions
问题描述
我有一个字符串向量 data
包含要插入到名为 foos
的表中的项目。有可能 data
中的某些元素已经存在于表中,因此我必须注意这些元素。
我使用的解决方案是通过将数据
向量转换为虚拟表 old_and_new
它然后构建包含已经存在于 foos
中的元素的虚拟表 old
;然后,它构造虚拟表新
与元素
这是真的新。最后,它在表 foos
中插入新的元素。
WITH old_and_new AS(SELECT notest($ data :: text [])AS foo),
old AS(SELECT foo FROM foos INNER JOIN old_and_new USING(foo)),
new AS(SELECT * FROM old_and_new EXCEPT SELECT * FROM old)
INSERT INTO foos(foo)SELECT foo FROM new
在非并发设置中,但如果并发线程
尝试在同一时间插入相同的新元素,则失败。我知道我可以通过将隔离级别设置为 serializable
来解决这个
,但这是非常繁琐的。
有没有其他方法我可以解决这个问题?如果只有一个方法
告诉PostgreSQL它是安全的忽略 INSERT
错误...
无论您的行动方案是什么( @Denis )选项),此重写的 INSERT
命令将更快速:
INSERT INTO foos(foo)
SELECT n.foo
FROM unnest($ data :: text [])AS n(foo)
LEFT JOIN foos o USING foo)
WHERE o.foo IS NULL
/ em>一个可能的竞争条件的较小的时间框架。
事实上,时间框架应该这么小,唯一的违规应该只在大量的并发负载或者巨大的数组下弹出。
数组中的复制?
除非你的问题是内置。你在输入数组本身有重复吗?在这种情况下,事务隔离不会帮助你。
INSERT INTO foos(foo)
SELECT n.foo
FROM(SELECT DISTINCT foo FROM unnest('{foo,bar,foo,baz}':: text [])AS foo)n
LEFT JOIN foos o USING(foo)
WHERE o.foo IS NULL
DISTINCT
在子查询中消除sleeper agents,又名重复。
人们往往忘记
完全自动化
此功能是一个方式来处理并发的好。如果发生 UNIQUE_VIOLATION
,则会重试 INSERT
。
它不可以处理相反的问题,一行可能已经被删除了同时删除 - 这不会被重新插入。有人可能会说,这个结果是确定的,因为这样的 DELETE
同时发生。如果你想防止这种情况,请使用 SELECT ... FOR SHARE
来保护并行的行 DELETE
。 / p>
CREATE OR REPLACE FUNCTION f_insert_array(_data text [],OUT ins_ct int)AS
$ func $
BEGIN
LOOP
BEGIN
INSERT INTO foos(foo)
SELECT n.foo
FROM(SELECT DISTINCT foo FROM unnest(_data )AS foo)n
LEFT JOIN foos o USING(foo)
WHERE o.foo IS NULL;
GET诊断ins_ct = ROW_COUNT;
RETURN;
EXCEPTION WHEN UNIQUE_VIOLATION THEN - tag.tag有UNIQUE约束。
RAISE注意'它实际上发生了! - 几乎没有发生
END;
END LOOP;
END
$ func $
LANGUAGE plpgsql;
我让函数返回插入行的计数,这是完全可选的。
I have a string vector data
containing items that I want to insert into a table named foos
. It's possible that some of the elements in data
already exist in the table, so I must watch out for those.
The solution I'm using starts by transforming the data
vector into virtual table old_and_new
; it then builds virtual table old
which contains the elements which are already present in foos
; then, it constructs virtual table new
with the elements
which are really new. Finally, it inserts the new elements in table foos
.
WITH old_and_new AS (SELECT unnest ($data :: text[]) AS foo),
old AS (SELECT foo FROM foos INNER JOIN old_and_new USING (foo)),
new AS (SELECT * FROM old_and_new EXCEPT SELECT * FROM old)
INSERT INTO foos (foo) SELECT foo FROM new
This works fine in a non-concurrent setting, but fails if concurrent threads
try to insert the same new element at the same time. I know I can solve this
by setting the isolation level to serializable
, but that's very heavy-handed.
Is there some other way I can solve this problem? If only there was a way to
tell PostgreSQL that it was safe to ignore INSERT
errors...
Whatever your course of action is (@Denis gave you quite a few options), this rewritten INSERT
command will be much faster:
INSERT INTO foos (foo)
SELECT n.foo
FROM unnest ($data::text[]) AS n(foo)
LEFT JOIN foos o USING (foo)
WHERE o.foo IS NULL
It also leaves a much smaller time frame for a possible race condition.
In fact, the time frame should be so small, that unique violations should only be popping up under heavy concurrent load or with huge arrays.
Dupes in the array?
Except, if you your problem is built-in. Do you have duplicates in the input array itself? In this case, transaction isolation is not going to help you. The enemy is within!
Consider this example / solution:
INSERT INTO foos (foo)
SELECT n.foo
FROM (SELECT DISTINCT foo FROM unnest('{foo,bar,foo,baz}'::text[]) AS foo) n
LEFT JOIN foos o USING (foo)
WHERE o.foo IS NULL
I use DISTINCT
in the subquery to eliminate the "sleeper agents", a.k.a. duplicates.
People tend to forget that the dupes may come within the import data.
Full automation
This function is one way to deal with concurrency for good. If a UNIQUE_VIOLATION
occurs, the INSERT
is just retried. The newly present rows are excluded from the new attempt automatically.
It does not take care of the opposite problem, that a row might have been deleted concurrently - this would not get re-inserted. One might argue, that this outcome is ok, since such a DELETE
happened concurrently. If you want to prevent this, make use of SELECT ... FOR SHARE
to protect rows from concurrent DELETE
.
CREATE OR REPLACE FUNCTION f_insert_array(_data text[], OUT ins_ct int) AS
$func$
BEGIN
LOOP
BEGIN
INSERT INTO foos (foo)
SELECT n.foo
FROM (SELECT DISTINCT foo FROM unnest(_data) AS foo) n
LEFT JOIN foos o USING (foo)
WHERE o.foo IS NULL;
GET DIAGNOSTICS ins_ct = ROW_COUNT;
RETURN;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- tag.tag has UNIQUE constraint.
RAISE NOTICE 'It actually happened!'; -- hardly ever happens
END;
END LOOP;
END
$func$
LANGUAGE plpgsql;
I made the function return the count of inserted rows, which is completely optional.
这篇关于忽略并发插入中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!