如何在并发访问时标记表中某些行的行 [英] How to mark certain nr of rows in table on concurrent access

查看:78
本文介绍了如何在并发访问时标记表中某些行的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的应用程序具有一个名为cargo_items的表.可以将其视为稍后处理这些项目的队列.最初有一个SINGLE作业,需要3000个条目,并接连处理它们.后来,有人决定开始同一工作的其他3个实例.发生的事情很明显,许多物品被处理了两次.

Our application has a table called cargo_items. It can be seen as kind of a queue to later process these items. Initially there was a SINGLE job which took 3000 entries and processed them one after another. Later on, somebody decided to start 3 other instances of the same job. What happened is quite obvious, many of the items were processed twice.

我的工作是,如果同时运行许多实例,则使这些进程正常运行.我现在要解决的方案是用job_id标记数据库中的3000个条目,然后再获取所有这些实体并将它们与其他进程隔离开来进行处理.

My job is to make these processes working correctly if at the same time many instances are running. The solution I am going for now is to mark 3000 entries in the database with a job_id and later on fetch all of these entities and process them isolated from the other processes.

我当前标记此行的方法如下:

My current approach for flagging this rows is the following:

UPDATE cargo_item item
SET job_id = 'SOME_UUID', job_ts = now()
FROM  ( 
   SELECT id
   FROM   cargo_item
   WHERE  state='NEW' AND job_id is null 
   LIMIT  3000
   FOR UPDATE
   ) sub
WHERE  item.id = sub.id;

基本上,此方法锁定3000行以进行更新.我不确定这是否是一个好方法.

Basically this approach locks the 3000 rows for a update. I am not sure if it is a good approach though.

在另一个线程上,我读到了有关在这种情况下使用咨询锁的信息.

On another thread I read about using advisory locks for this scenario.

您如何看待当前的方法并改用顾问锁?

What do you guys think about the current approach and using advisory lock instead?

根据建议,我将像这样修改更新语句:

As suggested, I'd adapt the update statement like this:

UPDATE cargo_item item
SET job_id = 'SOME_UUID', job_ts = now()
FROM  ( 
   SELECT id
   FROM   cargo_item
   WHERE  state='NEW' AND job_id is null 
   ORDER  BY id
   LIMIT  3000
   FOR UPDATE
   ) sub
WHERE  item.id = sub.id;

Thx Erwin和Tometzky作为提示.但是,我想知道我试图解决问题的方法是否是一种好的方法?您会想到不同的方法吗?

Thx Erwin and Tometzky for the hint. Nevertheless I wonder if the way I am trying to solve the problem is a good one? Are there different approaches you'd think of?

推荐答案

在相关答案中,您指的是:

In the related answer you are referring to:

目标是一次锁定 一个 行.不论有没有咨询锁,这都可以正常工作,因为没有死锁的机会-只要您不尝试在同一事务中锁定更多行.

The objective is to lock one row at a time. This works fine with or without advisory locks, because there is no chance for a deadlock - as long as you don't try to lock more rows in the same transaction.

您的示例与您的示例不同,您希望一次锁定 3000行.除所有并发写入操作以相同的一致顺序锁定行外,还有 潜在的死锁. 每个文档:

Your example is different in that you want to lock 3000 rows at a time. There is potential for deadlock, except if all concurrent write operations lock rows in the same consistent order. Per documentation:

防止死锁的最佳方法通常是避免死锁 确保所有使用数据库的应用程序都获得了锁定 多个对象保持一致的顺序.

The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order.

在子查询中使用ORDER BY来实现.

Implement that with an ORDER BY in your subquery.

UPDATE cargo_item item
SET job_id = 'SOME_UUID', job_ts = now()
FROM  ( 
   SELECT id
   FROM   cargo_item
   WHERE  state='NEW' AND job_id is null 
   ORDER  BY id
   LIMIT  3000
   FOR UPDATE
   ) sub
WHERE  item.id = sub.id;

这是安全可靠的,只要 all 事务以相同的顺序获取锁,并且不希望并发更新排序列. (阅读黄色的注意"框在本章最后一章.)因此,就您而言,这应该是安全的,因为您不会更新id列.

This is safe and reliable, as long as all transactions acquire locks in the same order and concurrent updates of the ordering columns are not to be expected. (Read the yellow "CAUTION" box at the end of this chapter in the manual.) So this should be safe in your case, since you are not going to update the id column.

实际上一次只有一个客户端可以以这种方式操作行.并发事务将尝试锁定相同(锁定)的行,并等待第一个事务完成.

Effectively only one client at a time can manipulate rows this way. Concurrent transactions would try to lock the same (locked) rows and wait for the first transaction to finish.

咨询锁很有用.仅使用少数几个查询,总体而言,使用上面的查询并让并发事务等待轮到它会更便宜.

Advisory locks are useful if you have many or very long running concurrent transactions (doesn't seem you do). With only a few, it will be cheaper overall to just use above query and have concurrent transactions wait for their turn.

似乎并发访问本身并不是设置中的问题.并发是您当前解决方案造成的问题.

It seems concurrent access isn't a problem per se in your setup. Concurrency is an issue created by your current solution.

相反,只需在单个UPDATE 中进行所有操作即可.为每个UUID分配n编号批次(在示例中为3000)并一次全部更新.应该最快.

Instead, do it all in a single UPDATE. Assign batches of n numbers (3000 in the example) to each UUID and update all at once. Should be fastest.

UPDATE cargo_item c
SET    job_id = u.uuid_col
     , job_ts = now()
FROM  (
   SELECT row_number() OVER () AS rn, uuid_col
   FROM   uuid_tbl WHERE  <some_criteria>  -- or see below
   ) u
JOIN (
   SELECT (row_number() OVER () / 3000) + 1 AS rn, item.id 
   FROM   cargo_item
   WHERE  state = 'NEW' AND job_id IS NULL
   FOR    UPDATE   -- just to be sure
   ) c2 USING (rn)
WHERE  c2.item_id = c.item_id;

要点

  • 整数除法会截断.对于前3000行,您将获得1;对于随后的3000行,您将获得2.等等.

    Major points

    • Integer division truncates. You get 1 for the first 3000 rows, 2 for the next 3000 rows. etc.

      我任意选择行,您可以在窗口中为row_number()应用ORDER BY来分配某些行.

      I pick rows arbitrarily, you could apply ORDER BY in the window for row_number() to assign certain rows.

      如果没有要分发的UUID表(uuid_tbl),请使用VALUES表达式提供它们. 示例.

      If you don't have a table of UUIDs to dispatch (uuid_tbl), use a VALUES expression to supply them. Example.

      您将获得3000行的批次.如果找不到要分配的3000的倍数,则最后一批将不足3000.

      You get batches of 3000 rows. The last batch will be short of 3000 if you don't find a multiple of 3000 to assign.

      这篇关于如何在并发访问时标记表中某些行的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆