类cron重复任务计划程序设计 [英] cron-like recurring task scheduler design

查看:74
本文介绍了类cron重复任务计划程序设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说您要安排重复执行的任务,例如:

  • 每个星期三上午10点发送电子邮件
  • 在每月的第一天创建摘要

您想为网络应用程序中的合理数量的用户执行此操作-即.每个用户10万个用户可以决定他们希望在什么时间安排.

并且您想要确保计划的项目运行,即使它们最初是丢失的也是如此-例如.由于某种原因,该电子邮件未能在星期三上午10点发送,因此应该在下一个检查间隔(例如星期三上午11点)发送出去.

您将如何设计?

如果您使用cron每隔X分钟触发一次调度应用程序,那么实现决定在每个时间点运行什么的部分的一种好方法是什么?

我看到的类似cron的实现将所有指定项目的当前时间与触发时间进行比较,但是我也想处理错过的项目.

我觉得比我正在烹饪的设计更聪明,所以请启发我.

解决方案

基本上有2种设计.

一个会定期运行,并将当前时间与日程安排规范进行比较(即现在运行吗?"),然后执行符合条件的时间.

另一种技术采用当前的调度规范,并找到该项目应触发的下一个时间.然后,它将当前时间与下一个时间"小于当前时间"的所有项目进行比较,并触发这些项目.然后,当一个项目完成时,将其重新计划为新的下一次".

第一种技术不能处理缺失"的物品,第二种技术只能处理先前计划的那些物品.

具体考虑一下,您有一个计划,该计划每小时运行一次,每小时一次.

例如1 pm、2pm、3pm、4pm.

在1:30 pm,运行任务关闭并且不执行任何进程.直到下午3:20才重新开始.

使用第一种技术,调度程序将触发1pm任务,但不会触发2pm和3pm任务,因为在经过这些时间后它并未运行.下一个要运行的工作将是下午4点,也就是下午4点.

使用第二种技术,调度程序将触发1pm任务,并将下一个任务调度为2pm.由于系统已关闭,因此2pm任务无法运行,而3pm任务也不会运行.但是,当系统在3:20重新启动时,它看到它错过了"下午2点的任务,并在下午3:20将其解雇,然后将其安排在下午4点再次进行.

每种技术都有其起伏.使用第一种技术,您会错过工作.使用第二种技术,您仍然可以错过工作,但是可以赶上"(某种程度上),但是它也可以在错误的时间"运行工作(也许应该在一个小时的高峰时间运行).原因).

第二种方法的好处是,如果您在执行作业的末尾重新安排时间,则不必担心级联作业的问题.

请考虑您有每分钟运行的作业.采用第一种技术时,作业每分钟都会被解雇.但是,通常情况下,如果作业在几分钟内未完成,那么您可能会运行2个作业(一个在过程中较晚,另一个在启动).如果作业设计为不能同时运行多次,则可能会出现问题.而且这种情况可能会加剧(如果存在真正的问题,则在10分钟后,您有10个工作相互竞争).

使用第二种方法,如果您在工作结束时进行计划,那么如果某项工作恰好运行了超过一分钟,那么您将跳过"一分钟",然后在接下来的一分钟开始运行,而不是继续运行因此,您可以为每分钟安排的工作安排在实际时间为1:01 pm、1:03pm、1:05pm等.

根据您的工作设计,这两个因素可能是好"或坏".这里没有正确的答案.

最后,与实施第二种技术相比,实施第一种技术确实非常琐碎.与得出cron字符串何时有效的NEXT相比,确定cron字符串(例如)是否匹配给定时间的代码很简单.我知道,我有几百行代码来证明这一点.不好看.

Say you want to schedule recurring tasks, such as:

  • Send email every wednesday at 10am
  • Create summary on the first day of every month

And you want to do this for a reasonable number of users in a web app - ie. 100k users each user can decide what they want scheduled when.

And you want to ensure that the scheduled items run, even if they were missed originally - eg. for some reason the email didn't get sent on wednesday at 10am, it should get sent out at the next checking interval, say wednesday at 11am.

How would you design that?

If you use cron to trigger your scheduling app every x minutes, what's a good way to implement the part that decides what should run at each point in time?

The cron-like implementations I've seen compare the current time to the trigger time for all specified items, but I'd like to deal with missed items as well.

I have a feeling there's a more clever design than the one I'm cooking up, so please enlighten me.

解决方案

There's 2 designs, basically.

One runs regularly and compares the current time to the scheduling spec (i.e. "Does this run now?"), and executes those that qualify.

The other technique takes the current scheduling spec and finds the NEXT time that the item should fire. Then, it compares the current time to all of those items who's "next time" is less than "current time", and fires those. Then, when an item is complete, it is rescheduled for the new "next time".

The first technique can not handle "missed" items, the second technique can only handle those items that were previously scheduled.

Specifically consider you you have a schedule that runs once every hour, at the top of the hour.

So, say, 1pm, 2pm, 3pm, 4pm.

At 1:30pm, the run task is down and not executing any processes. It does not start again until 3:20pm.

Using the first technique, the scheduler will have fired the 1pm task, but not fired the 2pm, and 3pm tasks, as it was not running when those times passed. The next job to run will be the 4pm job, at, well, 4pm.

Using the second technique, the scheduler will have fired the 1pm task, and scheduled the next task at 2pm. Since the system was down, the 2pm task did not run, nor did the 3pm task. But when the system restarted at 3:20, it saw that it "missed" the 2pm task, and fired it off at 3:20, and then scheduled it again for 4pm.

Each technique has it's ups and downs. With the first technique, you miss jobs. With the second technique you can still miss jobs, but it can "catch up" (to a point), but it may also run a job "at the wrong time" (maybe it's supposed to run at the top of the hour for a reason).

A benefit of the second technique is that if you reschedule at the END of the executing job, you don't have to worry about a cascading job problem.

Consider that you have a job that runs every minute. With the first technique, the job gets fired each minute. However, typically, if the job is not FINISHED within it's minute, then you can potentially have 2 jobs running (one late in the process, the other starting up). This can be a problem if the job is not designed to run more than once simultaneously. And it can exacerbate (if there's a real problem, after 10 minutes you have 10 jobs all fighting each other).

With the second technique, if you schedule at the end of the job, then if a job happens to run just over a minute, then you'll "skip" a minute" and start up the following minute rather than run on top of itself. So, you can have a job scheduled for every minute actually run at 1:01pm, 1:03pm, 1:05pm, etc.

Depending on your job design, either of these can be "good" or "bad". There's no right answer here.

Finally, implementing the first technique is really, quite trivial compared to implementing the second. The code to determine if a cron string (say) matches a given time is simple compared to deriving what time a cron string will be valid NEXT. I know, and I have a couple hundred lines of code to prove it. It's not pretty.

这篇关于类cron重复任务计划程序设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆