多线程处理数据库记录的最佳做法 [英] Best practices for multithreaded processing of database records

查看:569
本文介绍了多线程处理数据库记录的最佳做法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个单独的进程查询表中的 PROCESS_IND ='N',进行一些处理,然后更新 PROCESS_IND 更改为Y。



我想允许此进程的多个实例运行,但不知道避免并发问题的最佳做法。



我应该从哪里开始?

解决方案

使用如下:




  • 创建列lockedby和locktime,分别是线程/进程/机器ID和时间戳

  • 每个任务都会执行一个查询,例如:



    UPDATE taskstable SET lockedby =(my id),locktime = now()WHERE lockedby IS NULL ORDER BY ID LIMIT 10



$ b b

其中10是批量大小。




  • 然后,每个任务执行一个SELECT,以确定哪些行被锁定处理,并处理这些

  • 在每一行完成后,您将lockedby和locktime设置为NULL

  • 所有这一切都是在一个循环中完成的。 b
  • cron作业或计划任务定期重置锁定时间过长的任何行的被锁定,因为它们可能由挂起或崩溃的任务完成。


  • LIMIT 10是MySQL特有的,但我认为其他数据库有等同的。 ORDER BY是import,以避免查询是不确定的。


    I have a single process that queries a table for records where PROCESS_IND = 'N', does some processing, and then updates the PROCESS_IND to 'Y'.

    I'd like to allow for multiple instances of this process to run, but don't know what the best practices are for avoiding concurrency problems.

    Where should I start?

    解决方案

    The pattern I'd use is as follows:

    • Create columns "lockedby" and "locktime" which are a thread/process/machine ID and timestamp respectively (you'll need the machine ID when you split the processing between several machines)
    • Each task would do a query such as:

      UPDATE taskstable SET lockedby=(my id), locktime=now() WHERE lockedby IS NULL ORDER BY ID LIMIT 10

    Where 10 is the "batch size".

    • Then each task does a SELECT to find out which rows it's "locked" for processing, and processes those
    • After each row is complete, you set lockedby and locktime back to NULL
    • All this is done in a loop for as many batches as exist.
    • A cron job, or scheduled task, periodically resets the "lockedby" of any row whose locktime is too long ago, as they were presumably done by a task which has hung or crashed. Someone else will then pick them up

    The LIMIT 10 is MySQL specific but I think other databases have equivalents. The ORDER BY is import to avoid the query being nondeterministic.

    这篇关于多线程处理数据库记录的最佳做法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆