函数永远运行大量记录 [英] Function taking forever to run for large number of records

查看:141
本文介绍了函数永远运行大量记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Postgres 9.3.5中创建了以下函数:

  CREATE OR REPLACE FUNCTION get_result(val1 text,val2 text )
RETURNS text AS
$ BODY
$声明

结果文本;

BEGIN

从表
中选择min(id)到结果中,其中id_used为null,id_type = val2;

update table set
id_used ='Y',
col1 = val1,
id_used_date = now()
其中id_type = val2
和id = result;

RETURN result;

END;

$ BODY $
LANGUAGE plpgsql VOLATILE COST 100;

当我在超过1000条记录的循环中运行此函数时,说查询正在运行。当我检查我的表没有更新。当我运行它一两个记录它运行良好。



运行时的函数示例:

  select get_result('123','idtype'); 

表列:

  id字符变化(200),
col1字符变化(200),
id_used字符变化(1),
id_used_date没有时区的时间戳,
id_type字符(200)

id



有人可以帮助吗?

解决方案

正在进入竞态条件。当您在独立事务中快速连续运行您的函数1000次时,会发生以下情况:

  T1 T2 T3 ... 
SELECT max(id) - id 1
SELECT max(id) - id 1
SELECT max(id) - id 1
...
行id 1锁定,等待...
行id 1锁定,等待...
UPDATE id 1
...
$ b b COMMIT
唤醒,UPDATE id 1!
COMMIT
唤醒,UPDATE id 1了!
COMMIT
...

大量重写和简化为SQL函数: / p>

  CREATE OR REPLACE FUNCTION get_result(val1 text,val2 text)
RETURNS text AS
$ func $
UPDATE table t
SET id_used ='Y'
,col1 = val1
,id_used_date = now()
FROM(
SELECT id
FROM table
WHERE id_used IS NULL
AND id_type = val2
ORDER BY id
LIMIT 1
FOR UPDATE - 锁定以避免竞争条件!
)t1
WHERE t.id_type = val2
- AND t.id_used IS NULL - 重复条件(不是如果行被锁定)
AND t.id = t1。 id
RETURNING id;
$ func $ LANGUAGE sql;

相关问题有更多解释:





说明






锁定子句不能用于上下文中返回的行
无法用单个表行清楚地标识;例如
不能与汇总一起使用。





  • 大胆强调我。幸运的是,您可以轻松地用 ORDER BY / LIMIT 1替换 min(id) 我在上面提供。


  • 如果表格很大,您需要 id 。假设 id 已经被索引为 PRIMARY KEY ,这将有所帮助。但是,此额外的部分多列索引可能会帮助更多:

      CREATE INDEX foo_idx ON表(id_type,id)
    WHERE id_used IS NULL;




替代解决方案



咨询锁可能是最好的方法:





<您可能希望一次锁定多行




I have created the following function in Postgres 9.3.5:

CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS 
$BODY
$Declare

result text;

BEGIN

select min(id) into result from table 
where id_used is null and id_type = val2;

update table set 
id_used = 'Y', 
col1 = val1,  
id_used_date = now() 
where id_type = val2 
and id = result;

RETURN result;

END;

$BODY$
LANGUAGE plpgsql VOLATILE COST 100;

When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.

Example of the function when being run:

select get_result('123','idtype');

table columns:

id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)

id is the table index.

Can someone help?

解决方案

Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:

T1            T2            T3            ...
SELECT max(id) -- id 1
              SELECT max(id)  -- id 1
                            SELECT max(id)  -- id 1
                                          ...
              Row id 1 locked, wait ...
                            Row id 1 locked, wait ...
UPDATE id 1
                                          ... 

COMMIT
              Wake up, UPDATE id 1 again!
              COMMIT
                            Wake up, UPDATE id 1 again!
                            COMMIT
                                          ... 

Largely rewritten and simplified as SQL function:

CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
  RETURNS text AS 
$func$
   UPDATE table t
   SET    id_used = 'Y'
        , col1 = val1
        , id_used_date = now() 
   FROM  (
      SELECT id
      FROM   table 
      WHERE  id_used IS NULL
      AND    id_type = val2
      ORDER  BY id
      LIMIT  1
      FOR    UPDATE   -- lock to avoid race condition! see below ...
      ) t1
   WHERE  t.id_type = val2
   -- AND    t.id_used IS NULL -- repeat condition (not if row is locked)
   AND    t.id = t1.id
   RETURNING  id;
$func$  LANGUAGE sql;

Related question with a lot more explanation:

Explain

  • Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One UPDATE with a subquery is much better.

  • You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE stays the same.

  • You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:

The locking clauses cannot be used in contexts where returned rows cannot be clearly identified with individual table rows; for example they cannot be used with aggregation.

  • Bold emphasis mine. Luckily, you can replace min(id) easily with the equivalent ORDER BY / LIMIT 1 I provided above. Can use an index just as well.

  • If the table is big, you need an index on id at least. Assuming that id is indexed already as PRIMARY KEY, that would help. But this additional partial multicolumn index would probably help a lot more:

    CREATE INDEX foo_idx ON table (id_type, id)
    WHERE id_used IS NULL;
    

Alternative solutions

Advisory locks May be the superior approach here:

Or you may want to lock many rows at once:

这篇关于函数永远运行大量记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆