在气流 (1**.1*.0.35) 中运行来自不同集群 (1**.1*.0.21) 的 Spark Submit 程序.如何在气流中远程连接其他集群 [英] To run Spark Submit programs from a different cluster (1**.1*.0.21) in airflow (1**.1*.0.35). How to connect remotely other cluster in airflow

查看:27
本文介绍了在气流 (1**.1*.0.35) 中运行来自不同集群 (1**.1*.0.21) 的 Spark Submit 程序.如何在气流中远程连接其他集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试在 Airflow 中 SparkSubmit 程序,但 Spark 文件在不同的集群 (1**.1*.0.21) 中,而气流在 (1**.1*.0.35) 中.我正在寻找有关此主题的详细说明和示例.我无法将任何 xml 文件或其他文件复制或下载到我的气流集群.

I have been trying to SparkSubmit programs in Airflow, but spark files are in a different cluster (1**.1*.0.21) and airflow is in (1**.1*.0.35). I am looking for a detailed explanation of this topic with examples. I cant copy or download any xml files or other files to my airflow cluster.

当我尝试使用 SSH 钩子时,它说.虽然我对使用 SSH Operator 和 BashOperator 有很多疑问.

When I try in SSH hook it says. Though I have many doubts using SSH Operator and BashOperator.

Broken DAG: [/opt/airflow/dags/s.py] No module named paramiko

推荐答案

我得到了连接,这是我的代码和过程.

I got the connection and here is my code and procedure.

import airflow
from airflow import DAG
from airflow.contrib.operators.ssh_operator import SSHOperator
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta


dag = DAG(dag_id = "spk", description='filer',
          schedule_interval='* * * * *',
          start_date = airflow.utils.dates.days_ago(2),
          params={'project_source': '/home/afzal',
                  'spark_submit': '/usr/hdp/current/spark2-client/bin/spark-submit --principal hdfs-ivory@KDCAUTH.COM --keytab /etc/security/keytabs/hdfs.headless.keytab --master yarn --deploy-mode client airpy.py'})

templated_bash_command = """
            cd {{ params.project_source }}
            {{ params.spark_submit }} 
            """

t1 = SSHOperator(
       task_id="SSH_task",
       ssh_conn_id='spark_21',
       command=templated_bash_command,
       dag=dag
       )

我还在气流的管理>连接"中创建了一个连接

and I also created a connection in 'Admin > Connections' in airflow

Conn Id : spark_21
Conn Type : SSH
Host : mas****p
Username : afzal
Password : ***** 
Port  :
Extra  :

用户名和密码用于登录所需的集群.

The username and password is used to login to the desired cluster.

这篇关于在气流 (1**.1*.0.35) 中运行来自不同集群 (1**.1*.0.21) 的 Spark Submit 程序.如何在气流中远程连接其他集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆