如何设置 Zeppelin 以使用远程 EMR Yarn 集群 [英] How to set up Zeppelin to work with remote EMR Yarn cluster

查看:42
本文介绍了如何设置 Zeppelin 以使用远程 EMR Yarn 集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 Amazon EMR Hadoop v2.6 集群和 Spark 1.4.1,以及 Yarn 资源管理器.我想在单独的机器上部署 Zeppelin,以便在没有作业运行时关闭 EMR 集群.

I have Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with Yarn resource manager. I want to deploy Zeppelin on separate machine to allow turning off EMR cluster when there is no jobs running.

我尝试按照此处的说明操作 https://zeppelin.incubator.apache.org/docs/install/yarn_install.html收效甚微.

I tried following instruction from here https://zeppelin.incubator.apache.org/docs/install/yarn_install.html with not much of success.

有人可以解开 Zeppelin 如何从不同机器连接到现有 Yarn 集群的步骤吗?

Can somebody demystify steps how Zeppelin should connect to existing Yarn cluster from different machine?

推荐答案

[1] 使用适当的参数安装 Zeppelin:

[1] install Zeppelin with proper params:

git clone https://github.com/apache/incubator-zeppelin.git ~/zeppelin;
cd ~/zeppelin;
mvn clean package -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests

[2] 更新 EMR_MASTER EC2 安全组以接受来自所有端口的传入请求,与 Zeppelin 通信(应该是特定端口,尚不知道是哪个)

[2] Update EMR_MASTER EC2 security groups to accept incoming requests from all ports, to communicate with Zeppelin (should be specific port, not yet know which)

[3] 将目录 EMR_MASTER:/etc/hadoop/conf 复制到 MY_STANDALONE_SERVER:/home/zeppelin/hadoop-conf.

[3] Copy directory EMR_MASTER:/etc/hadoop/conf to MY_STANDALONE_SERVER:/home/zeppelin/hadoop-conf.

[4] zeppelin/conf/zeppelin-env.sh 应该包含:

[4] zeppelin/conf/zeppelin-env.sh should contain:

export MASTER=yarn-client
export HADOOP_CONF_DIR=/home/zeppelin/hadoop-conf

注意:spark.executor.instances 等 Spark 参数取自 Interpreter 设置,在那里指定.

Note: Spark parameters like spark.executor.instances are taken from Interpreter settings, is specified there.

这篇关于如何设置 Zeppelin 以使用远程 EMR Yarn 集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆