Postgres,更新和锁定顺序 [英] Postgres, update and lock ordering

查看:235
本文介绍了Postgres,更新和锁定顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究Postgres 9.2.

有2个UPDATE,每个UPDATE都在各自的事务中.一个看起来像:

UPDATE foo SET a=1 WHERE b IN (1,2,3,4);

另一个相似:

UPDATE foo SET a=2 WHERE b IN (1,2,3,4);

这些可能同时运行,实际上'IN'表达式中有500+. 我有时会遇到僵局.确实,"IN"表达式中的项目顺序实际上可能不会影响真正的锁顺序?

解决方案

是.我认为这里的主要问题是IN在指定的集合中检查成员资格,但不赋予UPDATE某种形式的排序,这又意味着对锁的排序没有任何具体的排序. /p>

UPDATE语句中的WHERE子句的行为基本上与SELECT中的行为相同.例如,我经常使用SELECT模拟UPDATE来检查要更新的内容,以确保它符合我的期望.

考虑到这一点,以下使用SELECT的示例演示IN本身并不赋予顺序:

给出以下架构/数据:

create table foo
(
  id serial,
  val text
);

insert into foo (val)
values ('one'), ('two'), ('three'), ('four');

以下查询:

select *
from foo
where id in (1,2,3,4);


select *
from foo
where id in (4,3,2,1);

产生完全相同的结果-从id 1-4开始按行排列.

即使没有得到保证,因为我没有在选择中使用ORDER BY.相反,如果没有它, Postgres 将使用服务器决定的最快顺序(请参阅 sqlfiddle 和上面的代码.

可能的修复程序/解决方法:

关于您可以做什么,有多种选择,具体取决于您的要求.您可以显式地删除表上的表锁,尽管这样做当然会在其中序列化更新的效果,这可能会导致瓶颈太大.

仍然允许并发的另一个选项是使用动态 SQL 在例如 Python 中的显式迭代项.这样,您将具有一组始终以相同顺序发生的单行更新,并且由于可以确保一致的顺序,因此正常的 Postgres 锁定应能够处理并发而无需僵局.

在纯SQL中,它的性能不如批量更新,但是它应该可以解决锁定问题.提高性能的一种建议是仅经常COMMIT次,而不是每行之后一次-这样可以节省很多开销.

另一个选择是在用 PL/pgSQL 编写的 Postgres 函数中执行循环.然后可以在外部,例如 Python 中调用该函数,但是循环(也明确地)在服务器端进行,这可以节省一些开销,因为循环和UPDATEs是完全在服务器端完成,而不必在每次循环迭代时都进行繁琐的操作.

I'm working on Postgres 9.2.

There are 2 UPDATEs, each in their own transactions. One looks like:

UPDATE foo SET a=1 WHERE b IN (1,2,3,4);

The other is similar:

UPDATE foo SET a=2 WHERE b IN (1,2,3,4);

These could possibly run at the same time and in reality have 500+ in the 'IN' expression. I'm sometimes seeing deadlocks. Is is true that that order of items in the 'IN' expression may not actually influence the true lock ordering?

解决方案

Yes. I think the main issue here is that IN checks for membership in the set specified, but does not confer any sort of ordering on the UPDATE, which in turn, means that no concrete ordering is conferred upon the lock ordering.

The WHERE clause in an UPDATE statement essentially behaves the same way it does in a SELECT. For example, I will often simulate an UPDATE using a SELECT to check what will be updated to see that it's what I expected.

With that in mind, the following example using SELECT demonstrates that IN does not in itself confer ordering:

Given this schema/data:

create table foo
(
  id serial,
  val text
);

insert into foo (val)
values ('one'), ('two'), ('three'), ('four');

The following queries:

select *
from foo
where id in (1,2,3,4);


select *
from foo
where id in (4,3,2,1);

yield the exact same results -- the rows in order from id 1-4.

Even that isn't guaranteed, since I did not use an ORDER BY in the select. Rather, without it, Postgres uses whatever order the server decides is fastest (see point 8 about ORDER BY in the Postgres SELECT doc). Given a fairly static table, it's often the same order in which it was inserted (as was the case here). However, there's nothing guaranteeing that, and if there's a lot of churn on the table (lots of dead tuples, rows removed, etc.), it's less likely to be the case.

I suspect that's what's happening here with your UPDATE. Sometimes -- if not even most of the time -- it may end up in numerical order if that's the same way the rows were inserted, but there's nothing to guarantee that, and the cases where you see the deadlocks are likely scenarios where the data has changed such that one update is ordered different than the other.

sqlfiddle with the above code.

Possible fixes/workarounds:

In terms of what you could do about it, there are various options, depending on your requirements. You could explicitly take out a table lock on the table, although that would of course have the effect of serializing the updates there, which may prove to be too large a bottleneck.

Another option, which would still allow for concurrency -- is to explicitly iterate over the items using dynamic SQL in, say, Python. That way, you'd have a set of one-row updates that occurred always in the same order, and since you could ensure that consistent order, the normal Postgres locking should be able to handle the concurrency without deadlocking.

That won't perform as well as batch-updating in pure SQL, but it should solve the lock issue. One suggestion to bump up performance is to only COMMIT every so often, and not after every single row -- that saves a lot of overhead.

Another option would be to do the loop in a Postgres function written in PL/pgSQL. That function could then be called externally, in, say, Python, but the looping would be done (also explicitly) server-side, which may save on some overhead, since the looping and UPDATEs are done entirely server-side without having to go over the wire each loop iteration.

这篇关于Postgres,更新和锁定顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆