可以在PostgreSQL中的读取提交隔离级别中发生丢失更新吗? [英] Can Lost Update happen in read committed isolation level in PostgreSQL?

查看:424
本文介绍了可以在PostgreSQL中的读取提交隔离级别中发生丢失更新吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在PostgreSQL中有如下的查询:

  UPDATE 
queue
SET
queue.status ='PROCESSING'
WHERE
queue.status ='WAITING'AND
queue.id =(SELECT id FROM queue WHERE STATUS ='WAITING'LIMIT 1)
RETURNING
queue.id

并且许多工人尝试一次处理一个工作这就是为什么我有子查询限制1)。在此更新后,每个工人获取有关id的信息和处理工作,但有时他们抓住相同的工作,并处理它两次或更多。隔离级别为已提交。



我的问题是,如何保证一项工作将被处理一次?我知道这里有很多邮件,但我可以说我已经尝试了大多数,它没有帮助();




  • 我试过SELECT FOR UPDATE,但它造成 deadlocked 情况。

  • 我尝试过pg_try_advisory_xact_lock,但却造成了 out of shared
    memory

  • 我尝试添加 AND pg_try_advisory_xact_lock(queue.id)到外部查询的WHERE子句,但是... [?]



任何帮助都将不胜感激。

解决方案

在您描述的情况下不会发生丢失的更新,但它也无法正常工作。



在上面给出的示例中会发生的情况是,给定(例如)10个worker同时启动,所有10个worker都将执行子查询并获取同样的ID 。他们将尝试锁定该ID。其中一个会成功;其他人将锁定在第一个锁上。一旦第一个后端提交或回滚,其他9个人将争夺锁。得到它,重新检查WHERE子句,看看 queue.status 测试不再匹配,并且返回而不修改任何行。同样的情况也会发生在其他8.所以你使用10个查询来完成一个查询的工作。



如果你没有明确检查 UPDATE 结果,看到零行更新,你可能会认为你得到了丢失的更新,但你不是。你只是在应用程序中有一个并发错误,这是由于对执行顺序和隔离规则的误解。所有的真正发生的是,你有效地序列化你的后端,所以只有一个在一个时间实际上是向前进步。



PostgreSQL可以避免让他们都得到相同的队列项ID将序列化它们,因此它直到查询#1完成才开始执行查询#2。如果你想,你可以通过 LOCK 来处理队列表...但是再次,你可能只有一个工人。



你不能用咨询锁来解决这个问题,反正不容易。 Hacks你使用非阻塞锁尝试迭代队列,直到你有第一个可锁定项目将工作,但会是缓慢和笨重。



您正在尝试实现使用RDBMS的工作队列。这不会很好。这将是缓慢的,它将是痛苦的,并得到它正确和快速将是非常非常困难的。不要卷自己。相反,使用一个成熟的,良好测试的系统可靠的任务排队。看看RabbitMQ,ZeroMQ,Apache ActiveMQ,Celery等。还有来自Skytools的 PGQ ,一个基于PostgreSQL的解决方案。 / p>

相关:




I have a query like below in PostgreSQL:

UPDATE 
     queue 
SET 
  queue.status   = 'PROCESSING' 
WHERE 
    queue.status   = 'WAITING' AND
    queue.id       = (SELECT id FROM queue WHERE STATUS = 'WAITING' LIMIT 1 )
RETURNING 
        queue.id

and many workers try to process one work at a time (that's why I have sub-query with limit 1). After this update, each worker grabs information about the id and processes the work, but sometimes they grab the same work and process it twice or more. The isolation level is Read Committed.

My question is how can I guarantee one work is going to be processed once? I know there is so many post out there but I can say I have tried most of them and it didn't help () ;

Any help would be appreciated.

解决方案

A lost update won't occur in the situation you describe, but it won't work properly either.

What will happen in the example you've given above is that given (say) 10 workers started simultaneously, all 10 of them will execute the subquery and get the same ID. They will all attempt to lock that ID. One of them will succeed; the others will block on the first one's lock. Once the first backend commits or rolls back, the 9 others will race for the lock. One will get it, re-check the WHERE clause and see that the queue.status test no longer matches, and return without modifying any rows. The same will happen with the other 8. So you used 10 queries to do the work of one query.

If you fail to explicitly check the UPDATE result and see that zero rows were updated you might think you were getting lost updates, but you aren't. You just have a concurrency bug in your application caused by a misunderstanding of the order-of-execution and isolation rules. All that's really happening is that you're effectively serializing your backends so that only one at a time actually makes forward progress.

The only way PostgreSQL could avoid having them all get the same queue item ID would be to serialize them, so it didn't start executing query #2 until query #1 finished. If you want to you can do this by LOCKing the queue table ... but again, you might as well just have one worker then.

You can't get around this with advisory locks, not easily anyway. Hacks where you iterated down the queue using non-blocking lock attempts until you got the first lockable item would work, but would be slow and clumsy.

You are attempting to implement a work queue using the RDBMS. This will not work well. It will be slow, it will be painful, and getting it both correct and fast will be very very hard. Don't roll your own. Instead, use a well established, well tested system for reliable task queueing. Look at RabbitMQ, ZeroMQ, Apache ActiveMQ, Celery, etc. There's also PGQ from Skytools, a PostgreSQL-based solution.

Related:

这篇关于可以在PostgreSQL中的读取提交隔离级别中发生丢失更新吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆