气流调度器误解 [英] Airflow Scheduler Misunderstanding

查看:69
本文介绍了气流调度器误解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Airflow的新手。

I'm new to Airflow.

我的目标是每天从现在开始1个小时运行一次dag。

My goal is to run a dag, on a daily basis, starting 1 hour from now.

我确实误会了气流时间表的间隔结束调用规则。

I'm truly misunderstanding the airflow schedule "end-of-interval invoke" rules.

来自文档[(Airflow Docs) ] [1]

From the docs [(Airflow Docs)][1]


请注意,如果您以一天的schedule_interval运行DAG,则标记为2016-01-01的运行将为在2016-01-01T23:59之后立即触发。换句话说,该作业实例在其涵盖的期限结束后即开始。

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

我按如下所示设置了schedule_interval:

I set schedule_interval as followed:

schedule_interval = 00 15 * * *

和开始日期如下所示:
start_date = datetime(year = 2019,month = 8,day = 7)

and start_date as followed: start_date=datetime(year=2019, month=8, day=7)

我的假设是,如果现在现在是世界标准时间(UTC)时间14:00:00 PM,今天的日期是2019年7月8日,那么我的dag将完全在一小时内执行。
但是,我的工作根本没有开始。

My assumption was, that if now it's 14:00:00 PM (UTC time) and the date today is 07-08-2019, then my dag will be executed exactly in one hour. However, my dag is not starting at all.

推荐答案

所以整个页面都在谈论气流工作,而不是预定的。 https://airflow.apache.org/faq.html

So there is a whole page talking about airflow job not been scheduled. https://airflow.apache.org/faq.html

这里要注意的关键是:


在Start_date + $之后,Airflow调度程序立即触发任务b $ b scheduler_interval已通过。

The Airflow scheduler triggers the task soon after the start_date + scheduler_interval is passed.

据我所知,您想触发任务 start_date = datetime( year = 2019,month = 8,day = 7),每天15:00 UTC schedule_interval = 00 15 * * * 表示您将每天在世界标准时间15:00运行任务。根据文档显示,调度程序会在开始日期+ scheduler_interval之后触发您的任务,因此气流不会触发它直到第二天(code)八月8th 2019 15:00:00 UTC 。或者,您可以将日期更改为第六天。通过ETL方式可能更容易理解:您只能在数据经过给定时间后再对其进行处理。因此, 2019年8月7日15:00:00 UTC 是您的起点,您需要等到 2019年8月8日15:00:00 UTC 以在给定时间内运行任务。

To my understanding, you want to trigger a task start_date=datetime(year=2019, month=8, day=7) at 15:00 UTC daily. schedule_interval="00 15 * * *" means you would run the task every day at 15:00 UTC. According to the docs, The scheduler triggers your task after start_date + scheduler_interval, so airflow won't trigger it until the next day which is August 8th 2019 15:00:00 UTC. Or you can change the day to 6th. It might be easier to understand this from ETL way: you can only process the data for a given period after it has passed. So August 7th 2019 15:00:00 UTC is your start point, you need to wait until August 8th 2019 15:00:00 UTC to run the task within that given period.

此外,请注意气流具有execution_data和start_date,您可以找到更多的此处

Also, note airflow has execution_data and start_date, you can find more here

这篇关于气流调度器误解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆