函数永远运行大量记录 [英] Function taking forever to run for large number of records

查看：141 发布时间：2016/12/26 17:50:50 postgresql function concurrency sql-update plpgsql

本文介绍了函数永远运行大量记录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Postgres 9.3.5中创建了以下函数：

  CREATE OR REPLACE FUNCTION get_result（val1 text，val2 text ）
 RETURNS text AS 
 $ BODY 
 $声明
 
结果文本; 
 
 BEGIN 
 
从表
中选择min（id）到结果中，其中id_used为null，id_type = val2; 
 
 update table set 
 id_used ='Y'，
 col1 = val1，
 id_used_date = now（）
其中id_type = val2 
和id = result; 
 
 RETURN result; 
 
 END; 
 
 $ BODY $ 
 LANGUAGE plpgsql VOLATILE COST 100;

当我在超过1000条记录的循环中运行此函数时，说查询正在运行。当我检查我的表没有更新。当我运行它一两个记录它运行良好。

运行时的函数示例：

  select get_result（'123'，'idtype'）;

表列：

  id字符变化（200），
 col1字符变化（200），
 id_used字符变化（1），
 id_used_date没有时区的时间戳，
 id_type字符（200）

id

有人可以帮助吗？

解决方案

正在进入竞态条件。当您在独立事务中快速连续运行您的函数1000次时，会发生以下情况：

  T1 T2 T3 ... 
 SELECT max（id） -  id 1 
 SELECT max（id） -  id 1 
 SELECT max（id） -  id 1 
 ... 
行id 1锁定，等待... 
行id 1锁定，等待... 
 UPDATE id 1 
 ... 
 $ b b COMMIT 
唤醒，UPDATE id 1！ 
 COMMIT 
唤醒，UPDATE id 1了！ 
 COMMIT 
 ...

大量重写和简化为SQL函数： / p>

  CREATE OR REPLACE FUNCTION get_result（val1 text，val2 text）
 RETURNS text AS 
 $ func $ 
 UPDATE table t 
 SET id_used ='Y'
，col1 = val1 
，id_used_date = now（）
 FROM（
 SELECT id 
 FROM table 
 WHERE id_used IS NULL 
 AND id_type = val2 
 ORDER BY id 
 LIMIT 1 
 FOR UPDATE  - 锁定以避免竞争条件！ 
）t1 
 WHERE t.id_type = val2 
  -  AND t.id_used IS NULL  - 重复条件（不是如果行被锁定）
 AND t.id = t1。 id 
 RETURNING id; 
 $ func $ LANGUAGE sql;

说明

不要运行两个单独的SQL语句。这是更昂贵的，并加宽了种族条件的时间框架。

你不需要PL / pgSQL的简单的任务。您仍然可以使用PL / pgSQL， UPDATE 保持不变。

p>您需要锁定所选行以防范竞争条件。但是你不能使用聚合函数来执行此操作，因为每个文档：

锁定子句不能用于上下文中返回的行
无法用单个表行清楚地标识;例如
，不能与汇总一起使用。

大胆强调我。幸运的是，您可以轻松地用 ORDER BY / LIMIT 1替换 min（id） 我在上面提供。

如果表格很大，您需要 id 。假设 id 已经被索引为 PRIMARY KEY ，这将有所帮助。但是，此额外的部分多列索引可能会帮助更多：
```
  CREATE INDEX foo_idx ON表（id_type，id）
 WHERE id_used IS NULL; 
  
```

替代解决方案

咨询锁可能是最好的方法：

Postgres UPDATE ... LIMIT 1

<您可能希望一次锁定多行：

如何在表中标记某些nr行的并发访问

I have created the following function in Postgres 9.3.5:

CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS 
$BODY
$Declare

result text;

BEGIN

select min(id) into result from table 
where id_used is null and id_type = val2;

update table set 
id_used = 'Y', 
col1 = val1,  
id_used_date = now() 
where id_type = val2 
and id = result;

RETURN result;

END;

$BODY$
LANGUAGE plpgsql VOLATILE COST 100;

When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.

Example of the function when being run:

select get_result('123','idtype');

table columns:

id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)

id is the table index.

Can someone help?

解决方案

Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:

T1            T2            T3            ...
SELECT max(id) -- id 1
              SELECT max(id)  -- id 1
                            SELECT max(id)  -- id 1
                                          ...
              Row id 1 locked, wait ...
                            Row id 1 locked, wait ...
UPDATE id 1
                                          ... 

COMMIT
              Wake up, UPDATE id 1 again!
              COMMIT
                            Wake up, UPDATE id 1 again!
                            COMMIT
                                          ...

Largely rewritten and simplified as SQL function:

CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
  RETURNS text AS 
$func$
   UPDATE table t
   SET    id_used = 'Y'
        , col1 = val1
        , id_used_date = now() 
   FROM  (
      SELECT id
      FROM   table 
      WHERE  id_used IS NULL
      AND    id_type = val2
      ORDER  BY id
      LIMIT  1
      FOR    UPDATE   -- lock to avoid race condition! see below ...
      ) t1
   WHERE  t.id_type = val2
   -- AND    t.id_used IS NULL -- repeat condition (not if row is locked)
   AND    t.id = t1.id
   RETURNING  id;
$func$  LANGUAGE sql;

Related question with a lot more explanation:

Atomic UPDATE .. SELECT in Postgres

Explain

Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One UPDATE with a subquery is much better.
You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE stays the same.
You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:

The locking clauses cannot be used in contexts where returned rows cannot be clearly identified with individual table rows; for example they cannot be used with aggregation.

Bold emphasis mine. Luckily, you can replace min(id) easily with the equivalent ORDER BY / LIMIT 1 I provided above. Can use an index just as well.
If the table is big, you need an index on id at least. Assuming that id is indexed already as PRIMARY KEY, that would help. But this additional partial multicolumn index would probably help a lot more:
```
CREATE INDEX foo_idx ON table (id_type, id)
WHERE id_used IS NULL;
```

Alternative solutions

Advisory locks May be the superior approach here:

Postgres UPDATE ... LIMIT 1

Or you may want to lock many rows at once:

How to mark certain nr of rows in table on concurrent access

这篇关于函数永远运行大量记录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

函数永远运行大量记录 [英] Function taking forever to run for large number of records

问题描述

说明

替代解决方案

Explain

Alternative solutions

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

函数永远运行大量记录 [英] Function taking forever to run for large number of records

问题描述

说明

替代解决方案

Explain

Alternative solutions

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭