函数需要永远运行大量记录 [英] Function taking forever to run for large number of records
问题描述
我在 Postgres 9.3.5 中创建了以下函数:
I have created the following function in Postgres 9.3.5:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$BODY
$Declare
result text;
BEGIN
select min(id) into result from table
where id_used is null and id_type = val2;
update table set
id_used = 'Y',
col1 = val1,
id_used_date = now()
where id_type = val2
and id = result;
RETURN result;
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
当我在超过 1000 条或更多记录的循环中运行此函数时,它只会冻结并只是说查询正在运行".当我检查我的表时,没有任何更新.当我为一两条记录运行它时,它运行良好.
When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.
运行时的函数示例:
select get_result('123','idtype');
表格列:
id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)
id
是表索引.
有人可以帮忙吗?
推荐答案
您很可能遇到了竞争条件.当您在单独的事务中快速连续运行函数 1000 次时,会发生以下情况:
Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:
T1 T2 T3 ...
SELECT max(id) -- id 1
SELECT max(id) -- id 1
SELECT max(id) -- id 1
...
Row id 1 locked, wait ...
Row id 1 locked, wait ...
UPDATE id 1
...
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
...
大量重写并简化为 SQL 函数:
Largely rewritten and simplified as SQL function:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$func$
UPDATE table t
SET id_used = 'Y'
, col1 = val1
, id_used_date = now()
FROM (
SELECT id
FROM table
WHERE id_used IS NULL
AND id_type = val2
ORDER BY id
LIMIT 1
FOR UPDATE -- lock to avoid race condition! see below ...
) t1
WHERE t.id_type = val2
-- AND t.id_used IS NULL -- repeat condition (not if row is locked)
AND t.id = t1.id
RETURNING id;
$func$ LANGUAGE sql;
有更多解释的相关问题:
Related question with a lot more explanation:
不要运行两个单独的 SQL 语句.这更昂贵,并扩大了比赛条件的时间范围.一个带有子查询的
UPDATE
要好得多.
你不需要 PL/pgSQL 来完成简单的任务.你仍然可以使用 PL/pgSQL,UPDATE
保持不变.
You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE
stays the same.
您需要锁定所选行以防止出现竞争条件.但是你不能用你所领导的聚合函数来做到这一点,因为 每个文档:
You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:
锁定子句不能在返回行的上下文中使用不能用单个表格行清楚地识别;例如它们不能与聚合一起使用.
The locking clauses cannot be used in contexts where returned rows cannot be clearly identified with individual table rows; for example they cannot be used with aggregation.
我的粗体强调.幸运的是,您可以轻松地将
min(id)
替换为我上面提供的等效ORDER BY
/LIMIT 1
.也可以使用索引.Bold emphasis mine. Luckily, you can replace
min(id)
easily with the equivalentORDER BY
/LIMIT 1
I provided above. Can use an index just as well.如果表很大,您需要至少在
id
上有一个索引.假设id
已经被索引为PRIMARY KEY
,这会有所帮助.但是这个额外的 部分多列索引 可能会帮助很多更多:If the table is big, you need an index on
id
at least. Assuming thatid
is indexed already asPRIMARY KEY
, that would help. But this additional partial multicolumn index would probably help a lot more:CREATE INDEX foo_idx ON table (id_type, id) WHERE id_used IS NULL;
咨询锁在这里可能是更好的方法:
Advisory locks May be the superior approach here:
或者您可能想要一次锁定多行:
Or you may want to lock many rows at once:
这篇关于函数需要永远运行大量记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!