数据库支持的工作队列 [英] Database Backed Work Queue

查看:27
本文介绍了数据库支持的工作队列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的情况...

我有一组 worker,它们被安排定期运行,每个都以不同的时间间隔运行,并且希望找到一个好的实现来管理它们的执行.

I have a set of workers that are scheduled to run periodically, each at different intervals, and would like to find a good implementation to manage their execution.

示例:假设我有一个工人每周去商店给我买一次牛奶.我想将此作业及其配置存储在 mysql 表中.但是,轮询表(每秒?)并查看哪些作业已准备好放入执行管道似乎真的是个坏主意.

Example: Let's say I have a worker that goes to the store and buys me milk once a week. I would like to store this job and it's configuration in a mysql table. But, it seems like a really bad idea to poll the table (every second?) and see which jobs are ready to be put into the execution pipeline.

我所有的工作人员都是用 javascript 编写的,所以我使用 node.js 来执行和 beanstalkd 作为管道.

All of my workers are written in javascript, so I'm using node.js for execution and beanstalkd as a pipeline.

如果正在异步创建新作业(即安排工作人员在给定时间运行)并且我需要持久存储作业结果和配置,我该如何避免轮询表?

If new jobs (ie. scheduling a worker to run at a given time) are being created asynchronously and I need to store the job result and configuration persistently, how do I avoid polling a table?

谢谢!

推荐答案

我同意这看起来不雅,但考虑到计算机的工作方式something *somewhere* 将不得不对某种以便确定何时执行哪些作业.那么,让我们来看看您的一些选择:

I agree that it seems inelegant, but given the way that computers work something *somewhere* is going to have to do polling of some kind in order to figure out which jobs to execute when. So, let's go over some of your options:

  1. 轮询数据库表.这根本不是一个坏主意 - 如果您无论如何都将作业存储在 MySQL 中,这可能是最简单的选择.每秒一个查询的速度不算什么 - 尝试一下,您会注意到您的系统甚至感觉不到它.

  1. Poll the database table. This isn't a bad idea at all - it's probably the simplest option if you're storing the jobs in MySQL anyway. A rate of one query per second is nothing - give it a try and you'll notice that your system doesn't even feel it.

一些想法可以帮助您将其扩展到每秒数百个查询,或者只是降低系统资源需求:

Some ideas to help you scale this to possibly hundreds of queries per second, or just keep system resource requirements down:

  • 创建第二个表job_pending",在其中放置需要在接下来 X 秒/分钟/小时内执行的作业.
  • 每隔一段时间只对所有作业的大表运行一次查询,然后填充您每隔一段时间查询一次的小表.
  • 从小表中删除已执行的作业以保持其较小.
  • 在您的execute_time"(或任何您称之为)列上使用索引.

如果您必须进一步扩展,请将主作业表保留在数据库中,并使用我建议的第二个较小的表,只需将该表放在 RAM 中:作为数据库引擎中的内存表,或在程序中的某种队列中.如果您也有,请以极短的时间间隔查询队列 - 需要一些极端的用例才能在此处导致任何性能问题.

If you have to scale even further, keep the main jobs table in the database, and use the second, smaller table I suggest, just put that table in RAM: either as a memory table in the DB engine, or in a Queue of some kind in your program. Query the queue at extremely short intervals if you have too - it'll take some extreme use cases to cause any performance issues here.

此选项的主要问题是您必须跟踪内存中但未执行的作业,例如由于系统崩溃 - 为您编写更多代码...

The main issue with this option is that you'll have to keep track of jobs that were in memory but didn't execute, e.g. due to a system crash - more coding for you...

为一堆作业中的每一个创建一个线程(比如,所有需要在下一分钟执行的作业),并调用 thread.sleep(millis_until_execution_time)(或者其他什么,我不太熟悉node.js).

Create a thread for each of a bunch of jobs (say, all jobs that need to execute in the next minute), and call thread.sleep(millis_until_execution_time) (or whatever, I'm not that familiar with node.js).

这个选项和 no 有同样的问题.2 - 您必须跟踪作业执行以进行崩溃恢复.这也是最浪费的 imo - 每个休眠的作业线程仍然占用系统资源.

This option has the same problem as no. 2 - where you have to keep track job execution for crash recovery. It's also the most wasteful imo - every sleeping job thread still takes system resources.

当然可能还有其他选择——我希望其他人回答更多想法.

There may be additional options of course - I hope that others answer with more ideas.

只要意识到每秒轮询数据库根本不是一个坏主意.这是 imo 最直接的方式(记住 KISS),按照这个速度,您不应该有性能问题,因此避免过早优化.

Just realize that polling the DB every second isn't a bad idea at all. It's the most straightforward way imo (remember KISS), and at this rate you shouldn't have performance issues so avoid premature optimizations.

这篇关于数据库支持的工作队列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆