在Airflow中使用Cron时间表时如何考虑夏令时 [英] How to consider daylight savings time when using cron schedule in Airflow

查看:548
本文介绍了在Airflow中使用Cron时间表时如何考虑夏令时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Airflow中,我希望每天在特定时间在非UTC时区运行作业。我该如何安排时间?



问题在于,一旦触发了夏时制,我的工作就会运行一个小时或太晚。


In Airflow, I'd like a job to run at specific time each day in a non-UTC timezone. How can I go about scheduling this?

The problem is that once daylight savings time is triggered, my job will either be running an hour too soon or an hour too late. In the Airflow docs, it seems like this is a known issue:

In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will then ignore day light savings time. Thus, if you have a schedule that says run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, regardless if day light savings time is in place.

Has anyone else run into this issue? Is there a work around? Surely the best practice cannot be to alter all the scheduled times after Daylight Savings Time occurs?

Thanks.

解决方案

Starting with Airflow 1.10, time-zone aware DAGs can be defined using time-zone aware datetime objects to specify start_date. For Airflow to schedule DAG runs always at the same time (regardless of a possible daylight-saving-time switch), use cron expressions to specify schedule_interval. To make Airflow schedule DAG runs with fixed intervals (regardless of a possible daylight-saving-time switch), use datetime.timedelta() to specify schedule_interval.

For example, consider the following code that, first, uses a cron expression to schedule two consecutive DAG runs, and then uses a fixed interval to do the same.

import pendulum
from airflow import DAG
from datetime import datetime, timedelta

START_DATE = datetime(
    year=2019,
    month=10,
    day=25,
    hour=8,
    minute=0,
    tzinfo=pendulum.timezone('Europe/Kiev'),
)


def gen_execution_dates(start_date, schedule_interval):
    dag = DAG(
        dag_id='id', start_date=start_date, schedule_interval=schedule_interval
    )
    execution_date = dag.start_date
    for i in range(1, 3):
        execution_date = dag.following_schedule(execution_date)
        print(
            f'[Run {i}: Execution Date for "{schedule_interval}"]:',
            dag.timezone.convert(execution_date),
        )


gen_execution_dates(START_DATE, '0 8 * * *')
gen_execution_dates(START_DATE, timedelta(days=1))

Running the code produces the following output:

[Run 1: Execution Date for "0 8 * * *"]: 2019-10-26 08:00:00+03:00
[Run 2: Execution Date for "0 8 * * *"]: 2019-10-27 08:00:00+02:00
[Run 1: Execution Date for "1 day, 0:00:00"]: 2019-10-26 08:00:00+03:00
[Run 2: Execution Date for "1 day, 0:00:00"]: 2019-10-27 07:00:00+02:00

For the zone [Europe/Kiev], the daylight saving time of 2019 ends on 2019-10-27 at 03:00:00+03:00. That is, between Run 1 and Run 2 in our example.

The first two output lines show that for the DAG runs scheduled with a cron expression the first run and second run are both scheduled for 08:00 (although, in different timezones: Eastern European Summer Time (EEST) and Eastern European Time (EET) respectively).

The last two output lines show that for the DAG runs scheduled with a fixed interval the first run is scheduled for 08:00 (EEST), and the second run is scheduled exactly 1 day (24 hours) later, which is at 07:00 (EET) due to the daylight-saving-time switch.

The following figure illustrates the example:

这篇关于在Airflow中使用Cron时间表时如何考虑夏令时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆