Apache 气流中的开始日期和预定日期问题 [英] Problem with start date and scheduled date in Apache airflow

查看:21
本文介绍了Apache 气流中的开始日期和预定日期问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Apache 气流,但我在预定日期和开始日期方面遇到问题.

I am working with Apache airflow and I have a problem with the scheduled day and the starting day.

我希望每天早上 8:00 UTC 运行 dag.所以,我所做的是:

I want a dag to run every day at 8:00 AM UTC. So, what I did is:

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2020, 12, 7, 10, 0,0),
        'email': ['example@emaiil.com'],
        'email_on_failure': True,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(hours=5)
    }
#never run
dag = DAG(dag_id='id', default_args=default_args, schedule_interval='0 8 * * *',catchup=True)

我上传 dag 的那天是 2020-12-07,我想在 2020-12-08 的 08:00:00 运行它

The day I upload the dag was 2020-12-07 and I wanted to run it on 2020-12-08 at 08:00:00

我将 start_date 设置在 2020-12-07 的 10:00:00 以避免在 2020-12-07 的 08:00:00 运行它,并且只在第二天触发它,但它不起作用.

I set the start_date at 2020-12-07 at 10:00:00 to avoid running it at 2020-12-07 at 08:00:00 and only trigger it the next day, but it didn't work.

然后我所做的是修改开始日期:

What I did then is modify the starting day:

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2020, 12, 7, 7, 59,0),
        'email': ['example@emaiil.com'],
        'email_on_failure': True,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(hours=5)
    }
#never run
dag = DAG(dag_id='etl-ca-cpke-spark_dev_databricks', default_args=default_args, schedule_interval='0 8 * * *',catchup=True)

现在开始日期是 dag 应该运行前 1 分钟,事实上,因为 catchup 设置为 True,dag 已在 2020-12-07 的 08:00:00 触发,但尚未被触发于 2020 年 12 月 8 日 08:00:00 触发.

Now the start date is 1 minute before the dag should run, and indeed, because the catchup is set to True, the dag has been triggered for 2020-12-07 at 08:00:00, but it has not being triggered for 2020-12-08 at 08:00:00.

为什么?

推荐答案

END 间隔 (参见文档参考)

意思是当你这样做时:

start_date: datetime(2020, 12, 7, 8, 0,0)
schedule_interval: '0 8 * * *'

第一次运行将于 2020-12-08 08:00+-(取决于资源)

The first run will kick in at 2020-12-08 at 08:00+- (depends on resources)

这次运行 execution_date 将是:2020-12-07 08:00

下一次运行将于 2020-12-09 08:00

这次运行2020-12-08 08:00execution_date.

因为今天是 2020-12-08 下一次运行没有开始,因为它不是间隔的END.

Since today is 2020-12-08 the next run didn't kick in because it's not the END of the interval yet.

这篇关于Apache 气流中的开始日期和预定日期问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆