函数永远运行大量记录 [英] Function taking forever to run for large number of records
问题描述
我在Postgres 9.3.5中创建了以下函数:
CREATE OR REPLACE FUNCTION get_result(val1 text,val2 text )
RETURNS text AS
$ BODY
$声明
结果文本;
BEGIN
从表
中选择min(id)到结果中,其中id_used为null,id_type = val2;
update table set
id_used ='Y',
col1 = val1,
id_used_date = now()
其中id_type = val2
和id = result;
RETURN result;
END;
$ BODY $
LANGUAGE plpgsql VOLATILE COST 100;
当我在超过1000条记录的循环中运行此函数时,说查询正在运行。当我检查我的表没有更新。当我运行它一两个记录它运行良好。
运行时的函数示例:
select get_result('123','idtype');
表列:
id字符变化(200),
col1字符变化(200),
id_used字符变化(1),
id_used_date没有时区的时间戳,
id_type字符(200)
id
有人可以帮助吗?
正在进入竞态条件。当您在独立事务中快速连续运行您的函数1000次时,会发生以下情况:
T1 T2 T3 ...
SELECT max(id) - id 1
SELECT max(id) - id 1
SELECT max(id) - id 1
...
行id 1锁定,等待...
行id 1锁定,等待...
UPDATE id 1
...
$ b b COMMIT
唤醒,UPDATE id 1!
COMMIT
唤醒,UPDATE id 1了!
COMMIT
...
大量重写和简化为SQL函数: / p>
CREATE OR REPLACE FUNCTION get_result(val1 text,val2 text)
RETURNS text AS
$ func $
UPDATE table t
SET id_used ='Y'
,col1 = val1
,id_used_date = now()
FROM(
SELECT id
FROM table
WHERE id_used IS NULL
AND id_type = val2
ORDER BY id
LIMIT 1
FOR UPDATE - 锁定以避免竞争条件!
)t1
WHERE t.id_type = val2
- AND t.id_used IS NULL - 重复条件(不是如果行被锁定)
AND t.id = t1。 id
RETURNING id;
$ func $ LANGUAGE sql;
相关问题有更多解释:
说明
-
不要运行两个单独的SQL语句。这是更昂贵的,并加宽了种族条件的时间框架。
-
你不需要PL / pgSQL的简单的任务。您仍然可以使用PL / pgSQL,
UPDATE
保持不变。 - p>您需要锁定所选行以防范竞争条件。但是你不能使用聚合函数来执行此操作,因为每个文档:
锁定子句不能用于上下文中返回的行
无法用单个表行清楚地标识;例如
,不能与汇总一起使用。
-
大胆强调我。幸运的是,您可以轻松地用
ORDER BY
/LIMIT 1替换
我在上面提供。min(id)
-
如果表格很大,您需要
id
。假设id
已经被索引为PRIMARY KEY
,这将有所帮助。但是,此额外的部分多列索引可能会帮助更多:CREATE INDEX foo_idx ON表(id_type,id)
WHERE id_used IS NULL;
替代解决方案
咨询锁可能是最好的方法:
<您可能希望一次锁定多行:
I have created the following function in Postgres 9.3.5:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$BODY
$Declare
result text;
BEGIN
select min(id) into result from table
where id_used is null and id_type = val2;
update table set
id_used = 'Y',
col1 = val1,
id_used_date = now()
where id_type = val2
and id = result;
RETURN result;
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.
Example of the function when being run:
select get_result('123','idtype');
table columns:
id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)
id
is the table index.
Can someone help?
Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:
T1 T2 T3 ...
SELECT max(id) -- id 1
SELECT max(id) -- id 1
SELECT max(id) -- id 1
...
Row id 1 locked, wait ...
Row id 1 locked, wait ...
UPDATE id 1
...
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
...
Largely rewritten and simplified as SQL function:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$func$
UPDATE table t
SET id_used = 'Y'
, col1 = val1
, id_used_date = now()
FROM (
SELECT id
FROM table
WHERE id_used IS NULL
AND id_type = val2
ORDER BY id
LIMIT 1
FOR UPDATE -- lock to avoid race condition! see below ...
) t1
WHERE t.id_type = val2
-- AND t.id_used IS NULL -- repeat condition (not if row is locked)
AND t.id = t1.id
RETURNING id;
$func$ LANGUAGE sql;
Related question with a lot more explanation:
Explain
Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One
UPDATE
with a subquery is much better.You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the
UPDATE
stays the same.You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:
The locking clauses cannot be used in contexts where returned rows cannot be clearly identified with individual table rows; for example they cannot be used with aggregation.
Bold emphasis mine. Luckily, you can replace
min(id)
easily with the equivalentORDER BY
/LIMIT 1
I provided above. Can use an index just as well.If the table is big, you need an index on
id
at least. Assuming thatid
is indexed already asPRIMARY KEY
, that would help. But this additional partial multicolumn index would probably help a lot more:CREATE INDEX foo_idx ON table (id_type, id) WHERE id_used IS NULL;
Alternative solutions
Advisory locks May be the superior approach here:
Or you may want to lock many rows at once:
这篇关于函数永远运行大量记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!