Cloud Composer (Airflow) 作业卡住 [英] Cloud Composer (Airflow) jobs stuck

查看:25
本文介绍了Cloud Composer (Airflow) 作业卡住的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Cloud Composer 管理的 Airflow 卡住了几个小时,因为我取消了一个耗时太长的任务实例(我们称之为任务 A)

My Cloud Composer managed Airflow got stuck for hours since I've canceled a Task Instance that was taking too long (Let's call it Task A)

我已经清除了所有的 DAG 运行和任务实例,但是有几个作业正在运行,一个作业处于关闭状态(我想是任务 A 的作业)(我的工作快照).

I've cleared all the DAG Runs and task instances, but there are a few jobs running and one job with Shutdown state (I suppose the job of Task A) (snapshot of my Jobs).

此外,调度程序似乎没有运行,因为最近删除的 DAG 不断出现在仪表板中

Besides, it seems that the scheduler is not running since recently deleted DAGs keep appearing in the dashboard

有没有办法终止作业或重置调度程序?欢迎任何解除作曲家卡住的想法.

Is there a way to kill the jobs or reset the scheduler? Any idea to un-stuck the composer will be welcomed.

推荐答案

您可以按如下方式重新启动调度程序:

You can restart the scheduler as follows:

来自您的云外壳:

1.确定您环境的 Kubernetes 集群:

1.Determine your environment’s Kubernetes cluster:

gcloud composer environments describe ENVIRONMENT_NAME 
    --location LOCATION 

2.获取凭据并连接到 Kubernetes 集群:

2.Get credentials and connect to the Kubernetes cluster:

gcloud container clusters get-credentials ${GKE_CLUSTER} --zone ${GKE_LOCATION}

3.运行以下命令重新启动调度程序:

3.Run the following command to restart the scheduler:

kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -

第 1 步和第 2 步的详细信息此处.Step 3 基本上将airflow-scheduler"部署替换为自身,从而重启服务.

Steps 1 and 2 are detailed here. Step 3 basically replaces the "airflow-scheduler" deployment with itself, thus restarting the service.

如果重新启动调度程序没有帮助,如果每次都发生这种情况,您可能还需要重新创建您的 Composer 环境并排查 DAG 的故障.

If restarting the scheduler doesn’t help you may as well need to recreate your Composer Environment and Troubleshoot your DAGs if this happens every time.

这篇关于Cloud Composer (Airflow) 作业卡住的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆