问题使用apscheduler for Django项目在Procfile(Heroku)中定义Cron作业 [英] ISSUES Defining Cron jobs in Procfile (Heroku) using apscheduler for Django project

查看:135
本文介绍了问题使用apscheduler for Django项目在Procfile(Heroku)中定义Cron作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在安排cron作业时遇到问题,该作业需要抓取网站并将其作为模型(MOVIE)的一部分存储在数据库中。

I am having a problem scheduling a cron job which requires scraping a website and storing it as part of the model (MOVIE) in the database.

问题是执行Procfile之前似乎已经加载了模型。

我应该如何创建一个cron作业,该作业在后台内部运行并将已收集的信息存储到数据库中?这是我的代码:

The problem is that the model seems to get loaded before Procfile is executed.
How should I create a cron job which runs internally in the background and storing scraped information into the database? Here are my codes:

Procfile:

    web: python manage.py runserver 0.0.0.0:$PORT
    scheduler: python cinemas/scheduler.py

scheduler.py:

scheduler.py:

# More code above
from cinemas.models import Movie
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()

@sched.scheduled_job('cron', day_of_week='mon-fri', hour=0, minutes=26)    
def get_movies_playing_now():
  global url_movies_playing_now
  Movie.objects.all().delete()
  while(url_movies_playing_now):
    title = []
    description = []
    #Create BeatifulSoup Object with url link
    s = requests.get(url_movies_playing_now, headers=headers)
    soup = bs4.BeautifulSoup(s.text, "html.parser")
    movies = soup.find_all('ul', class_='w462')[0]

    #Find Movie's title
    for movie_title in movies.find_all('h3'):
        title.append(movie_title.text)
    #Find Movie's description
    for movie_description in soup.find_all('ul',
                                           class_='w462')[0].find_all('p'):
        description.append(movie_description.text.replace(" [More]","."))

    for t, d in zip(title, description):
        m = Movie(movie_title=t, movie_description=d)
        m.save()

    #Go to the next page to find more movies
    paging = soup.find( class_='pagenating').find_all('a', class_=lambda x:
                                                      x != "inactive")
    href = ""
    for p in paging:
        if "next" in p.text.lower():
            href = p['href']
    url_movies_playing_now = href

sched.start()
# More code below

cinemas / models.py以下的更多代码:

cinemas/models.py:

from django.db import models

#Create your models here.

class Movie(models.Model):
    movie_title = models.CharField(max_length=200)
    movie_description = models.CharField(max_length=20200)

这是我在运行作业时遇到的错误。

This is the error i am getting when the Job is ran.


2016-11-17T17:57:06.074914 + 00:00 app [scheduler.1]:Traceback(最近最近一次通话的
):2016-11-17T17:57:06.074931+ 00:00 app [scheduler.1]:
文件 cinemas / scheduler.py,第2行,在
中2016-11-17T17:57:06.075058 + 00:00 app [scheduler.1] :导入cineplex
2016-11-17T17:57:06.075060 + 00:00 app [scheduler.1]:文件
/app/cinemas/cineplex.py,第1行,在$ b $中b 2016-11-17T17:57:06.075173 + 00:00 app [scheduler.1]:从
Cinemas.models导入电影2016-11-17T17:57:06.075196 + 00:00
应用[ scheduler.1]:文件 /app/cinemas/models.py,第5行,在
中2016-11-17T17:57:06.075295 + 00:00 app [scheduler.1]:类
电影(model.Model):2016-11-17T17:57:06.075297 + 00:00
app [scheduler.1]:文件
/app/.heroku/python/lib/python3.5/site-packages/django/db/models/base.py\",
行105,在中,2016-11-17T17:57:06.075414 + 00:00
app [scheduler.1]:app_config =
apps.get_ contains_app_config(module)
2016-11-17T17:57:06.075440 + 00:00 app [scheduler.1]:文件
/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry .py,
第237行,位于get_tained_app_config
2016-11-17T17:57:06.075585 + 00:00 app [scheduler.1]:

self.check_apps_ready()2016 -11-17T17:57:06.075586 + 00:00
应用程序[scheduler.1]:文件
/app/.heroku/python/lib/python3.5/site-packages/django/apps /registry.py\",
行124,在check_apps_ready 2016-11-17T17:57:06.075703 + 00:00
应用程序[scheduler.1]:提高AppRegistryNotReady(应用程序未加载
yet。)。2016-11-17T17:57:06.075726 + 00:00 app [scheduler.1]:
django.core.exceptions.AppRegistryNotReady:应用尚未加载。

2016-11-17T17:57:06.074914+00:00 app[scheduler.1]: Traceback (most recent call last): 2016-11-17T17:57:06.074931+00:00 app[scheduler.1]: File "cinemas/scheduler.py", line 2, in 2016-11-17T17:57:06.075058+00:00 app[scheduler.1]: import cineplex 2016-11-17T17:57:06.075060+00:00 app[scheduler.1]: File "/app/cinemas/cineplex.py", line 1, in 2016-11-17T17:57:06.075173+00:00 app[scheduler.1]: from cinemas.models import Movie 2016-11-17T17:57:06.075196+00:00 app[scheduler.1]: File "/app/cinemas/models.py", line 5, in 2016-11-17T17:57:06.075295+00:00 app[scheduler.1]: class Movie(models.Model): 2016-11-17T17:57:06.075297+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/base.py", line 105, in new 2016-11-17T17:57:06.075414+00:00 app[scheduler.1]: app_config = apps.get_containing_app_config(module) 2016-11-17T17:57:06.075440+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 237, in get_containing_app_config 2016-11-17T17:57:06.075585+00:00 app[scheduler.1]:
self.check_apps_ready() 2016-11-17T17:57:06.075586+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 124, in check_apps_ready 2016-11-17T17:57:06.075703+00:00 app[scheduler.1]: raise AppRegistryNotReady("Apps aren't loaded yet.") 2016-11-17T17:57:06.075726+00:00 app[scheduler.1]: django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.

如果我不包括Model对象,Cron工作会很好。

Cron job works fine if I do not include Model objects. How should I run this job every day using Model objects without failing?

谢谢

推荐答案

那是因为您不能只导入Django包,模型等。

为了正常工作,Django内部需要初始化,该初始化由 manage触发.py

That's because you can't just import the Django packages, models, etc.
In order to work properly, the Django internals require initialization, that's triggered from manage.py.

我总是尝试将长时间运行的非Web命令编写为自定义管理命令

Rather than try and re-create all that myself, I always write long-running, non-web commands as a custom management command.

例如,如果您的应用是 cinemas ,则您将:

For example, if your app is cinemas, you would:


  • 创建 ./ cinemas / management / commands / scheduler.py

  • 在该文件,创建一个子类 django.core.management.base.BaseCommand (该子类必须称为 Command

  • 在该类中,覆盖 handle()。在您的情况下,您将在其中调用 sched.start()

  • 您的 Procfile 随后将具有调度程序:python manage.py scheduler

  • Create ./cinemas/management/commands/scheduler.py.
  • In that file, create a sub-class django.core.management.base.BaseCommand (that sub-class must be called Command)
  • In that class, override handle(). In your case, that's where you'd call sched.start()
  • Your Procfile would then have scheduler: python manage.py scheduler

希望有帮助。

这篇关于问题使用apscheduler for Django项目在Procfile(Heroku)中定义Cron作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆