Cloud Composer(Airflow)作业卡住了 [英] Cloud Composer (Airflow) jobs stuck

查看:83
本文介绍了Cloud Composer(Airflow)作业卡住了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Cloud Composer 托管的 Airflow 卡住了几个小时,因为我取消了一个占用过多的任务实例长(我们称它为任务A)

My Cloud Composer managed Airflow got stuck for hours since I've canceled a Task Instance that was taking too long (Let's call it Task A)

我已经清除了所有DAG运行和任务实例,但是有几个正在运行的作业和一个处于关机状态的作业(我假设任务A的工作)(我的工作快照)。

I've cleared all the DAG Runs and task instances, but there are a few jobs running and one job with Shutdown state (I suppose the job of Task A) (snapshot of my Jobs).

此外,自最近删除的DAG不断出现在仪表板中

是否有办法杀死作业或重置调度程序?

Is there a way to kill the jobs or reset the scheduler? Any idea to un-stuck the composer will be welcomed.

推荐答案

您可以按以下方式重新启动调度程序:

You can restart the scheduler as follows:

从您的云外壳中:

1。确定环境的Kubernetes集群:

1.Determine your environment’s Kubernetes cluster:

gcloud composer environments describe ENVIRONMENT_NAME \
    --location LOCATION 

2。获取凭据并连接到Kubernetes集群:

2.Get credentials and connect to the Kubernetes cluster:

gcloud container clusters get-credentials ${GKE_CLUSTER} --zone ${GKE_LOCATION}

3。运行以下命令以重新启动调度程序:

3.Run the following command to restart the scheduler:

kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -

第1步和第2步详细此处。步骤3基本上用其自身替换了 airflow-scheduler部署,从而重新启动了服务。

Steps 1 and 2 are detailed here. Step 3 basically replaces the "airflow-scheduler" deployment with itself, thus restarting the service.

如果重新启动调度程序无济于事,您可能还需要重新创建您的Composer Environment,并在每次发生DAG时进行故障排除。

If restarting the scheduler doesn’t help you may as well need to recreate your Composer Environment and Troubleshoot your DAGs if this happens every time.

这篇关于Cloud Composer(Airflow)作业卡住了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆