在 Elastic Beanstalk 中运行 cron 作业 [英] Running a cron job in Elastic Beanstalk

查看:36
本文介绍了在 Elastic Beanstalk 中运行 cron 作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我在 Django Elastic Beanstalk 应用程序中有一个功能如下:

  • 下载文件
  • 解析文件,使用文件中的数据运行一些对 API 的调用
  • 使用新数据更新 EB 实例的数据库

在我刚刚设置本地 cron 作业的测试实例中.我刚刚在我的 Django 应用程序的特定 URL 上调用了 wget,它会运行命令.

我的问题是如何在多实例 Elastic Beanstalk 应用程序中处理此问题.只有我的 EB 应用程序的一个实例应该运行此命令.我想避免数据库上的竞争条件以及从多个实例对外部 API 的冗余调用.即只有一个实例应该写入数据.

然而,谷歌搜索显示设置 cron 作业很尴尬,特别是如果你像我一样刚接触 EB.最有前途的听起来方法似乎是 cron.yaml 方法,但我所看到的似乎没有在网络上的任何地方设置 cron worker 环境的示例.

我的理解是:

  • 您在 EB 项目的根目录中包含了一个 cron.yaml 文件.
  • 部署项目
  • cron 作业会在工作环境 (?) 中自动设置.
  • 您定义的命令在指定的时间运行.

我的问题是如何确保只有一个实例会运行此命令?我是否对 cron.yaml 的工作原理有正确的想法,或者我遗漏了什么

解决方案

只有一个实例会运行该命令,因为 cron 作业本身实际上并不在 cron 守护进程中运行.

很少有概念可以帮助您快速了解亚马逊的 Elastic Beanstalk 思维方式.

  • 一个有弹性的 beanstalk 环境必须选择一个领导者实例,其中必须只有一个(并且它必须是一个健康的实例等).
  • 工作环境通过 SQS(简单队列服务)队列分配工作.
  • 一旦从队列中读取了一条消息,它就会被视为进行中",直到工作人员返回 200 或请求超时/失败.在第一种情况下,消息被删除,在后一种情况下,它重新进入队列.(重新驱动策略可以确定消息在发送到死信队列之前失败的次数)
  • 无法再次读取飞行中的消息(除非返回).

队列中的消息一次仅被工作环境中的一个实例拾取一次.

现在 cron.yaml 文件实际上只是告诉领导者在计划中指定的时间在队列中创建具有特殊属性的消息.当它找到此消息时,它仅作为对指定 URL 的 POST 请求分派到一个实例.

当我在工作环境中使用 Django 时,我创建了一个 cron 应用程序,其中的视图映射到我想要的操作.例如,如果我想定期轮询 Facebook 端点,我可能有一个路径 /cron/facebook/poll/ 调用 views.py 中的 poll_facebook() 函数>

这样,如果我有一个如下的 cron.yaml,它会每小时轮询 Facebook 一次:

版本:1定时任务:- 名称:投票脸书"网址:/cron/facebook/poll/"时间表:0 * * * *"

So I have a functionality in a Django Elastic Beanstalk app that works like so:

  • Download a file
  • Parse the file, run some calls to API's with the data from the file
  • Update the database of the EB instance with the new data

In testing instances where I just set up a local cron job. I just called wget on a specific URL of my Django application and it will run the command.

My problem is how to handle this in a multi-instanced Elastic Beanstalk application. Only one instance of my EB application should run this command. I want to avoid race conditions on the database and redundant calls to external API's from multiple instances. i.e. only one instance should be writing to the databe.

However, Googling around shows setting up cron jobs is awkward, particularly if your new to EB like I am. The most promising sounding method seems to be the cron.yaml method, but there does not seem to be an example of setting up a cron worker environment anywhere on the web from what I can see.

My understanding is:

  • You include a cron.yaml file in the root of your EB project.
  • Deploy the project
  • The cron jobs are automatically set up in a worker environment (?).
  • The command you defined is ran at the specified time(s).

My question is how do you make sure that only one instance will run this command? Do I have the right idea on how cron.yaml works or is there something I'm missing

解决方案

Only one instance will run the command because the cron job does not actually run in a cron daemon per-se.

There are few concepts that might help you quickly grok amazon's Elastic Beanstalk mindset.

  • An elastic beanstalk environment must elect a leader instance of which there must only ever be one (And it must be a healthy instance etc).
  • A worker environment allocates work via an SQS (Simple Queue Service) queue.
  • Once a message has been read from the queue it is considered 'in-flight' until the worker returns 200 or the request times out/fails. In the first scenario the message is deleted, and in the latter scenario it re-enters the queue. (Redrive policies can determine how many times a message can fail before it is sent to the Dead Letter Queue)
  • In flight messages cannot be read again (Unless returned).

A message in the queue is picked up only once by one of the instances in the worker environment at a time.

Now the cron.yaml file actually just tells the leader to create a message in the queue with special attributes, at the times specified in the schedule. When it then finds this message, it's dispatched to one instance only as a POST request to the specified URL.

When I use Django in a worker environment I create a cron app with views that map to the action I want. For example if I wanted to periodically poll a Facebook endpoint I might have a path /cron/facebook/poll/ which calls a poll_facebook() function in views.py

That way if I have a cron.yaml as follows, it'll poll Facebook once every hour:

version: 1
cron:
 - name: "pollfacebook"
   url: "/cron/facebook/poll/"
   schedule: "0 * * * *"

这篇关于在 Elastic Beanstalk 中运行 cron 作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆