多线程处理数据库记录的最佳做法 [英] Best practices for multithreaded processing of database records
问题描述
我有一个单独的进程查询表中的 PROCESS_IND
='N',进行一些处理,然后更新 PROCESS_IND
更改为Y。
我想允许此进程的多个实例运行,但不知道避免并发问题的最佳做法。
我应该从哪里开始?
使用如下:
- 创建列lockedby和locktime,分别是线程/进程/机器ID和时间戳
-
每个任务都会执行一个查询,例如:
UPDATE taskstable SET lockedby =(my id),locktime = now()WHERE lockedby IS NULL ORDER BY ID LIMIT 10
$ b b
其中10是批量大小。
- 然后,每个任务执行一个SELECT,以确定哪些行被锁定处理,并处理这些
- 在每一行完成后,您将lockedby和locktime设置为NULL
- 所有这一切都是在一个循环中完成的。 b
- cron作业或计划任务定期重置锁定时间过长的任何行的被锁定,因为它们可能由挂起或崩溃的任务完成。
- Create columns "lockedby" and "locktime" which are a thread/process/machine ID and timestamp respectively (you'll need the machine ID when you split the processing between several machines)
Each task would do a query such as:
UPDATE taskstable SET lockedby=(my id), locktime=now() WHERE lockedby IS NULL ORDER BY ID LIMIT 10
- Then each task does a SELECT to find out which rows it's "locked" for processing, and processes those
- After each row is complete, you set lockedby and locktime back to NULL
- All this is done in a loop for as many batches as exist.
- A cron job, or scheduled task, periodically resets the "lockedby" of any row whose locktime is too long ago, as they were presumably done by a task which has hung or crashed. Someone else will then pick them up
LIMIT 10是MySQL特有的,但我认为其他数据库有等同的。 ORDER BY是import,以避免查询是不确定的。
I have a single process that queries a table for records where PROCESS_IND
= 'N', does some processing, and then updates the PROCESS_IND
to 'Y'.
I'd like to allow for multiple instances of this process to run, but don't know what the best practices are for avoiding concurrency problems.
Where should I start?
The pattern I'd use is as follows:
Where 10 is the "batch size".
The LIMIT 10 is MySQL specific but I think other databases have equivalents. The ORDER BY is import to avoid the query being nondeterministic.
这篇关于多线程处理数据库记录的最佳做法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!