在 PostgreSQL 9.2 中更新数据库行而不锁定表 [英] Updating database rows without locking the table in PostgreSQL 9.2

查看:36
本文介绍了在 PostgreSQL 9.2 中更新数据库行而不锁定表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试使用 PostgreSQL 9.2 在表上运行这样的更新语句:

更新表SET a_col = 数组[col];

我们需要能够在大约 10M 行的表上运行它,而不是让它锁定表(因此在更新运行时仍然可以进行正常操作).我相信使用游标可能是正确的解决方案,但我真的不知道它是否是或者我应该如何使用游标来实现它.

我想出了这个光标代码,我认为这可能很好.

添加光标功能

<上一页>创建或替换函数 update_fields() 返回无效为 $$宣布游标 CURSOR FOR SELECT * FROM table ORDER BY id FOR UPDATE;开始FOR 行 IN 游标 LOOP更新表 SETa_col = 数组[col],a_col2=数组[col2]WHERE CURRENT OF 游标;结束循环;结尾;$$ 语言 plpgsql;

解决方案

MVCC

首先,如果正常操作"由 SELECT 查询组成,则 MVCC 模型 会自动处理它.UPDATE 不会阻止 SELECT,反之亦然.SELECT 只看到已提交的数据(或在同一事务中已完成的内容),因此大 UPDATE 的结果在完成(已提交)之前对其他事务不可见.

性能/膨胀

如果您没有其他对象引用该表,
并且您没有并发写入操作(这会丢失!),
并且您可以在表上提供一个非常短的排他锁,
并且你当然有额外的磁盘空间:
您可以通过在后台创建表的更新版本来将锁定保持在最低限度.确保它有 everything 来替代,然后删除原始文件并重命名副本.

CREATE TABLE tbl_new (LIKE tbl_org INCLUDING CONSTRAINTS);插入到 tbl_new选择 col_a, col_b, array[col] aS col_c来自tbl_org;

我正在使用 CREATE TABLE (LIKE .. INCLUDING CONSTRAINTS),因为 (这里引用手册):

<块引用>

非空约束总是被复制到新表中.检查只有指定了 INCLUDING CONSTRAINTS 时才会复制约束;永远不会复制其他类型的约束.

确保新表已准备就绪.那么:

DROP tbl_org;更改表 tbl_new 重命名为 tbl_org;

导致一个非常短的时间窗口,其中表被独占锁定.

这实际上只与性能有关.它会很快创建一个没有任何膨胀的新表.如果你有外键或视图,你仍然可以走这条路,但你必须准备一个脚本来删除和重新创建这些对象,这可能会创建额外的独占锁.

并发写入

对于并发写入操作,您实际上所能做的就是将更新拆分成块.您不能在单个事务中执行此操作,因为锁定仅在事务结束时释放.

可以使用 dblink,它可以在另一个数据库上启动独立的事务,包括它自己.这样,您可以在单个 DO 语句或带有循环的 plpgsql 函数中完成所有操作.这是一个松散相关的答案,其中包含有关 dblink 的更多信息:

您使用光标的方法

函数内的光标不会给你买任何东西.任何功能都自动包含在事务中,所有锁仅在事务结束时释放.即使您使用 CLOSE cursor(你没有)它只会释放一些资源,但不会释放表上获得的锁.我引用手册:

<块引用>

CLOSE 关闭打开光标下的门户.这可以用来在事务结束之前释放资源,或者释放游标变量被再次打开.

您需要运行单独事务或(ab)使用dblink 为你做这件事.

Trying to run an update statement like this on a table, using PostgreSQL 9.2:

UPDATE table
    SET a_col = array[col];

We need to be able to run this on a ~10M row table, and not have it lock up the table (so normal operations can still happen while the update is running). I believe using a cursor will probably be the right solution, but I really have no idea if it is or how I should implement it using a cursor.

I have come up with this cursor code, which I think might be good.

Edit: Added cursor function

CREATE OR REPLACE FUNCTION update_fields() RETURNS VOID AS $$
DECLARE
        cursor CURSOR FOR SELECT * FROM table ORDER BY id FOR UPDATE;
BEGIN
        FOR row IN cursor LOOP
                UPDATE table SET
                        a_col = array[col],
                        a_col2= array[col2]
                WHERE CURRENT OF cursor;
        END LOOP;
END;
$$ LANGUAGE plpgsql;

解决方案

MVCC

First off, if "normal operations" consist of SELECT queries, the MVCC model will take care of it automatically. UPDATE does not block SELECT and vice versa. SELECT only sees committed data (or what's been done in the same transaction), so the result of the big UPDATE remains invisible to other transactions until it's done (committed).

Performance / bloat

If you don't have other objects referencing that table,
and you don't have concurrent write operations (which would be lost!),
and you can afford a very short exclusive lock on the table,
and you have the additional disk space, of course:
You could keep the locking to a minimum by creating an updated version of the table in the background. Make sure it has everything to be a drop-in replacement, then drop the original and rename the dupe.

CREATE TABLE tbl_new (LIKE tbl_org INCLUDING CONSTRAINTS);

INSERT INTO tbl_new 
SELECT col_a, col_b, array[col] aS col_c
FROM   tbl_org;

I am using CREATE TABLE (LIKE .. INCLUDING CONSTRAINTS), because (quoting the manual here):

Not-null constraints are always copied to the new table. CHECK constraints will only be copied if INCLUDING CONSTRAINTS is specified; other types of constraints will never be copied.

Make sure, the new table is ready. Then:

DROP tbl_org;
ALTER TABLE tbl_new RENAME TO tbl_org;

Results in an very short time window, where the table is locked exclusively.

This is really only about performance. It creates a new table without any bloat rather quickly. If you have foreign keys or views, you can still go that route, but you have to prepare a script to drop and recreate these objects, potentially creating additional exclusive locks.

Concurrent writes

With concurrent write operations, really all you can do, is split your update in chunks. You can't do that in a single transaction, since locks are only released at the end of a transaction.

You could employ dblink, which can launch independent transactions on another database, including itself. This way you could do it all in a single DO statement or a plpgsql function with a loop. Here is a loosely related answer with more information on dblink:

Your approach with cursors

A cursor inside the function will not buy you anything. Any function is enclosed in a transaction automatically, and all locks are only released at the end of the transaction. Even if you used CLOSE cursor (which you don't) it would only free some resources, but not release acquired locks on the table. I quote the manual:

CLOSE closes the portal underlying an open cursor. This can be used to release resources earlier than end of transaction, or to free up the cursor variable to be opened again.

You would need to run separate transactions or (ab)use dblink which does that for you.

这篇关于在 PostgreSQL 9.2 中更新数据库行而不锁定表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆