启动集群时在 EMR 上配置 Zeppelin 的 Spark 解释器 [英] Configure Zeppelin's Spark Interpreter on EMR when starting a cluster

查看:31
本文介绍了启动集群时在 EMR 上配置 Zeppelin 的 Spark 解释器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 EMR 上创建集群并配置 Zeppelin 以从 S3 读取笔记本.为此,我使用了一个如下所示的 json 对象:

<预><代码>[{"分类": "zeppelin-env",特性": {},配置":[{"分类": "出口",特性": {"ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo","ZEPPELIN_NOTEBOOK_S3_BUCKET":"hs-zeppelin-notebooks","ZEPPELIN_NOTEBOOK_USER":"用户"},配置":[]}]}]

我将这个对象粘贴到 EMR 的软件配置页面中:我的问题是,如何/在哪里可以直接配置 Spark 解释器,而无需在每次启动集群时从 Zeppelin 手动配置它?

解决方案

这个有点复杂,你需要做两件事:

  1. 编辑 Zeppelin 的 interpreter.json
  2. 重启解释器

所以您需要做的是编写一个 shell 脚本,然后在运行此 shell 脚本的 EMR 集群配置中添加一个额外的步骤.

Zeppelin 配置在 json 中,可以使用 jq(一个工具)来操作 json.我不知道你到底想改变什么,但这里有一个例子,添加了(神秘失踪)DepInterpreter:

#!/bin/bash# 1 编辑 Spark 解释器设置 -ecat/etc/zeppelin/conf/interpreter.json |jq '.interpreterSettings."2ANGGHHMQ".interpreterGroup |= .+ [{"class":"org.apache.zeppelin.spark.DepInterpreter", "name":"dep"}]' |须藤 -u zeppelin tee/etc/zeppelin/conf/interpreter.json# 触发 Spark 解释器重启curl -X PUT http://localhost:8890/api/interpreter/setting/restart/2ANGGHHMQ

将此 shell 脚本放入 s3 存储桶中.然后使用

启动您的 EMR 集群

--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[s3://mybucket/script.sh]

I am creating clusters on EMR and configure Zeppelin to read the notebooks from S3. To do that I am using a json object that looks like that:

[
  {
    "Classification": "zeppelin-env",
    "Properties": {

    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
        "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
          "ZEPPELIN_NOTEBOOK_S3_BUCKET":"hs-zeppelin-notebooks",
          "ZEPPELIN_NOTEBOOK_USER":"user"
        },
        "Configurations": [

        ]
      }
    ]
  }
]

I am pasting this object in the Stoftware configuration page of EMR: My question is, how/where I can configure the Spark interpreter directly without the need to manually configure it from Zeppelin each time I start a cluster?

解决方案

This is a bit involved, you will need to do 2 things:

  1. Edit the interpreter.json of Zeppelin
  2. Restart the interpreter

So what you need to do is write a shell script and then add an extra step to the EMR cluster configuration that runs this shell script.

The Zeppelin configuration is in json, you can use jq (a tool) to manipulate json. I don't know what you want to change exactly, but here is an example that adds the (mysteriously missing) DepInterpreter:

#!/bin/bash

# 1 edit the Spark interpreter
set -e
cat /etc/zeppelin/conf/interpreter.json | jq '.interpreterSettings."2ANGGHHMQ".interpreterGroup |= .+ [{"class":"org.apache.zeppelin.spark.DepInterpreter", "name":"dep"}]' | sudo -u zeppelin tee /etc/zeppelin/conf/interpreter.json


# Trigger restart of Spark interpreter
curl -X PUT http://localhost:8890/api/interpreter/setting/restart/2ANGGHHMQ

Put this shell script in a s3 bucket. Then start your EMR cluster with

--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[s3://mybucket/script.sh]

这篇关于启动集群时在 EMR 上配置 Zeppelin 的 Spark 解释器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆