问题使用apscheduler for Django项目在Procfile(Heroku)中定义Cron作业 [英] ISSUES Defining Cron jobs in Procfile (Heroku) using apscheduler for Django project
问题描述
我在安排cron作业时遇到问题,该作业需要抓取网站并将其作为模型(MOVIE)的一部分存储在数据库中。
I am having a problem scheduling a cron job which requires scraping a website and storing it as part of the model (MOVIE) in the database.
问题是执行Procfile之前似乎已经加载了模型。
我应该如何创建一个cron作业,该作业在后台内部运行并将已收集的信息存储到数据库中?这是我的代码:
The problem is that the model seems to get loaded before Procfile is executed.
How should I create a cron job which runs internally in the background and storing scraped information into the database? Here are my codes:
Procfile:
web: python manage.py runserver 0.0.0.0:$PORT
scheduler: python cinemas/scheduler.py
scheduler.py:
scheduler.py:
# More code above
from cinemas.models import Movie
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
@sched.scheduled_job('cron', day_of_week='mon-fri', hour=0, minutes=26)
def get_movies_playing_now():
global url_movies_playing_now
Movie.objects.all().delete()
while(url_movies_playing_now):
title = []
description = []
#Create BeatifulSoup Object with url link
s = requests.get(url_movies_playing_now, headers=headers)
soup = bs4.BeautifulSoup(s.text, "html.parser")
movies = soup.find_all('ul', class_='w462')[0]
#Find Movie's title
for movie_title in movies.find_all('h3'):
title.append(movie_title.text)
#Find Movie's description
for movie_description in soup.find_all('ul',
class_='w462')[0].find_all('p'):
description.append(movie_description.text.replace(" [More]","."))
for t, d in zip(title, description):
m = Movie(movie_title=t, movie_description=d)
m.save()
#Go to the next page to find more movies
paging = soup.find( class_='pagenating').find_all('a', class_=lambda x:
x != "inactive")
href = ""
for p in paging:
if "next" in p.text.lower():
href = p['href']
url_movies_playing_now = href
sched.start()
# More code below
cinemas / models.py以下的更多代码:
cinemas/models.py:
from django.db import models
#Create your models here.
class Movie(models.Model):
movie_title = models.CharField(max_length=200)
movie_description = models.CharField(max_length=20200)
这是我在运行作业时遇到的错误。
This is the error i am getting when the Job is ran.
2016-11-17T17:57:06.074914 + 00:00 app [scheduler.1]:Traceback(最近最近一次通话的
):2016-11-17T17:57:06.074931+ 00:00 app [scheduler.1]:
文件 cinemas / scheduler.py,第2行,在
中2016-11-17T17:57:06.075058 + 00:00 app [scheduler.1] :导入cineplex
2016-11-17T17:57:06.075060 + 00:00 app [scheduler.1]:文件
/app/cinemas/cineplex.py,第1行,在$ b $中b 2016-11-17T17:57:06.075173 + 00:00 app [scheduler.1]:从
Cinemas.models导入电影2016-11-17T17:57:06.075196 + 00:00
应用[ scheduler.1]:文件 /app/cinemas/models.py,第5行,在
中2016-11-17T17:57:06.075295 + 00:00 app [scheduler.1]:类
电影(model.Model):2016-11-17T17:57:06.075297 + 00:00
app [scheduler.1]:文件
/app/.heroku/python/lib/python3.5/site-packages/django/db/models/base.py\",
行105,在新中,2016-11-17T17:57:06.075414 + 00:00
app [scheduler.1]:app_config =
apps.get_ contains_app_config(module)
2016-11-17T17:57:06.075440 + 00:00 app [scheduler.1]:文件
/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry .py,
第237行,位于get_tained_app_config
2016-11-17T17:57:06.075585 + 00:00 app [scheduler.1]:
self.check_apps_ready()2016 -11-17T17:57:06.075586 + 00:00
应用程序[scheduler.1]:文件
/app/.heroku/python/lib/python3.5/site-packages/django/apps /registry.py\",
行124,在check_apps_ready 2016-11-17T17:57:06.075703 + 00:00
应用程序[scheduler.1]:提高AppRegistryNotReady(应用程序未加载
yet。)。2016-11-17T17:57:06.075726 + 00:00 app [scheduler.1]:
django.core.exceptions.AppRegistryNotReady:应用尚未加载。
2016-11-17T17:57:06.074914+00:00 app[scheduler.1]: Traceback (most recent call last): 2016-11-17T17:57:06.074931+00:00 app[scheduler.1]: File "cinemas/scheduler.py", line 2, in 2016-11-17T17:57:06.075058+00:00 app[scheduler.1]: import cineplex 2016-11-17T17:57:06.075060+00:00 app[scheduler.1]: File "/app/cinemas/cineplex.py", line 1, in 2016-11-17T17:57:06.075173+00:00 app[scheduler.1]: from cinemas.models import Movie 2016-11-17T17:57:06.075196+00:00 app[scheduler.1]: File "/app/cinemas/models.py", line 5, in 2016-11-17T17:57:06.075295+00:00 app[scheduler.1]: class Movie(models.Model): 2016-11-17T17:57:06.075297+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/base.py", line 105, in new 2016-11-17T17:57:06.075414+00:00 app[scheduler.1]: app_config = apps.get_containing_app_config(module) 2016-11-17T17:57:06.075440+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 237, in get_containing_app_config 2016-11-17T17:57:06.075585+00:00 app[scheduler.1]:
self.check_apps_ready() 2016-11-17T17:57:06.075586+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 124, in check_apps_ready 2016-11-17T17:57:06.075703+00:00 app[scheduler.1]: raise AppRegistryNotReady("Apps aren't loaded yet.") 2016-11-17T17:57:06.075726+00:00 app[scheduler.1]: django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
如果我不包括Model对象,Cron工作会很好。
Cron job works fine if I do not include Model objects. How should I run this job every day using Model objects without failing?
谢谢
推荐答案
那是因为您不能只导入Django包,模型等。
为了正常工作,Django内部需要初始化,该初始化由 manage触发.py
。
That's because you can't just import the Django packages, models, etc.
In order to work properly, the Django internals require initialization, that's triggered from manage.py
.
我总是尝试将长时间运行的非Web命令编写为自定义管理命令。
Rather than try and re-create all that myself, I always write long-running, non-web commands as a custom management command.
例如,如果您的应用是 cinemas
,则您将:
For example, if your app is cinemas
, you would:
- 创建
./ cinemas / management / commands / scheduler.py
。 - 在该文件,创建一个子类
django.core.management.base.BaseCommand
(该子类必须称为Command
) - 在该类中,覆盖
handle()
。在您的情况下,您将在其中调用sched.start()
- 您的
Procfile
随后将具有调度程序:python manage.py scheduler
- Create
./cinemas/management/commands/scheduler.py
. - In that file, create a sub-class
django.core.management.base.BaseCommand
(that sub-class must be calledCommand
) - In that class, override
handle()
. In your case, that's where you'd callsched.start()
- Your
Procfile
would then havescheduler: python manage.py scheduler
希望有帮助。
这篇关于问题使用apscheduler for Django项目在Procfile(Heroku)中定义Cron作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!