数据库支持工作队列 [英] Database Backed Work Queue

查看:221
本文介绍了数据库支持工作队列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的情况...

我有一组的的被安排在不同的时间间隔定期运行,每一个,想找到一个很好的实现来管理其执行。

I have a set of workers that are scheduled to run periodically, each at different intervals, and would like to find a good implementation to manage their execution.

例子:比方说,我有一个工人是去商店,买了我一个星期一次奶。我想存储这个工作,它在一个MySQL表配置。但是,这似乎是一个真正坏主意轮询表(每一秒?),看看哪些作业准备投入到执行流水线。

Example: Let's say I have a worker that goes to the store and buys me milk once a week. I would like to store this job and it's configuration in a mysql table. But, it seems like a really bad idea to poll the table (every second?) and see which jobs are ready to be put into the execution pipeline.

我所有的工人都写在JavaScript中,所以我用了执行和 beanstalkd 作为管道。

All of my workers are written in javascript, so I'm using node.js for execution and beanstalkd as a pipeline.

如果新的就业机会(即安排一个工人在给定的时间运行)正在异步创建,我需要存储的作业结果和配置持久,如何避免轮询表?

If new jobs (ie. scheduling a worker to run at a given time) are being created asynchronously and I need to store the job result and configuration persistently, how do I avoid polling a table?

谢谢!

推荐答案

我同意这似乎不雅,但鉴于方式,电脑工作的的东西的*地方*将不得不做的轮询的某种的为了弄清楚哪些作业时执行。所以,让我们去一下自己的选择:

I agree that it seems inelegant, but given the way that computers work something *somewhere* is going to have to do polling of some kind in order to figure out which jobs to execute when. So, let's go over some of your options:


  1. 轮询数据库表。这不是一个坏主意 - 它可能是最简单的选项,如果你存储在MySQL中的工作反正。每秒查询的速度是什么 - 试试看,你会发现你的系统甚至不觉得它。

  1. Poll the database table. This isn't a bad idea at all - it's probably the simplest option if you're storing the jobs in MySQL anyway. A rate of one query per second is nothing - give it a try and you'll notice that your system doesn't even feel it.

一些想法,以帮助你达到这可能是每秒数百查询,或只保留系统资源的要求下来:

Some ideas to help you scale this to possibly hundreds of queries per second, or just keep system resource requirements down:


  • 创建第二个表,job_pending',你把需要进行下一个X秒/分钟/小时内执行的作业。

  • 运行查询您的所有作业大表更长,而只有一次,然后填充它查询,而每一个较短的小桌子。

  • 删除从该小桌子,以保持它的小执行的作业。

  • 在你的'execute_time(或者无论你怎么称呼它)列使用一个索引。

如果您需要进一步扩展,保持主要工作表在数据库中,并使用第二个,较小的表,我建议,只要把该表在RAM:无论是作为数据库引擎的存储表,或者某种在你的程序的队列。在极短的时间间隔查询队列,如果你有太多 - 它会采取一些极端的用例来这里造成任何性能问题

If you have to scale even further, keep the main jobs table in the database, and use the second, smaller table I suggest, just put that table in RAM: either as a memory table in the DB engine, or in a Queue of some kind in your program. Query the queue at extremely short intervals if you have too - it'll take some extreme use cases to cause any performance issues here.

此选项的主要问题是,你必须持续跟踪是在内存中,但没有执行的作业,例如由于系统崩溃 - 更多的编码你...

The main issue with this option is that you'll have to keep track of jobs that were in memory but didn't execute, e.g. due to a system crash - more coding for you...

创建每个一堆工作(比如,需要在下一分钟要执行的所有作业),并调用了Thread.Sleep(millis_until_execution_time)(或什么的,我没那么熟悉的一个线程Node.js的)。

Create a thread for each of a bunch of jobs (say, all jobs that need to execute in the next minute), and call thread.sleep(millis_until_execution_time) (or whatever, I'm not that familiar with node.js).

这个选项有同样的问题,因为没有。 2 - 你必须跟踪作业执行的崩溃恢复。这也是最浪费IMO - 每间寝室的工作线程仍然需要系统资源。

This option has the same problem as no. 2 - where you have to keep track job execution for crash recovery. It's also the most wasteful imo - every sleeping job thread still takes system resources.

有可能,当然附加选项 - 我希望别人有更多的想法回答。

There may be additional options of course - I hope that others answer with more ideas.

只要知道查询数据库每秒钟是不是一个坏主意。这是最简单的方式国际海事组织(记住KISS),并以这样的速度,你不应该有性能问题,所以要避免premature优化。

Just realize that polling the DB every second isn't a bad idea at all. It's the most straightforward way imo (remember KISS), and at this rate you shouldn't have performance issues so avoid premature optimizations.

这篇关于数据库支持工作队列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆