如何在EMR中设置自定义环境变量以供Spark应用程序使用 [英] How to set a custom environment variable in EMR to be available for a spark Application

查看:229
本文介绍了如何在EMR中设置自定义环境变量以供Spark应用程序使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在运行Spark应用程序时,我需要在EMR中设置一个自定义环境变量。



我已经尝试加入:

  ... 
--configurations'[
{
Classification:spark-env,
Configurations:[
{
Classification:export ,
Configurations:[],
Properties:{SOME-ENV-VAR:qa1}
}
],
属性:{}
}
]'
...



<并且试图用 hadoop-env来替换spark-env code>
,但似乎没什么用。



这是来自aws论坛的回答。但我无法弄清楚如何应用它。
我在EMR 5.3.1上运行,并使用来自cli的预配置步骤启动它: aws emr create-cluster ...

  [
{
Classification:spark-env,
Properties: {},
配置:[
{
分类:导出,
属性:{
VARIABLE_NAME:VARIABLE_VALUE,




$ b code


然后,在创建emr集群时,将文件引用传递给 - 配置选项

  aws emr create-cluster --configurations file://custom_config.json --other-options ... 


I need to set a custom environment variable in EMR to be available when running a spark application.

I have tried adding this:

                   ...
                   --configurations '[                                    
                                      {
                                      "Classification": "spark-env",
                                      "Configurations": [
                                        {
                                        "Classification": "export",
                                        "Configurations": [],
                                        "Properties": { "SOME-ENV-VAR": "qa1" }
                                        }
                                      ],
                                      "Properties": {}
                                      }
                                      ]'
                   ...

and also tried to replace "spark-env with hadoop-env but nothing seems to work.

There is this answer from the aws forums. but I can't figure out how to apply it. I'm running on EMR 5.3.1 and launch it with a preconfigured step from the cli: aws emr create-cluster...

解决方案

Add the custom configurations like below JSON to a file say, custom_config.json

[   
  {
   "Classification": "spark-env",
   "Properties": {},
   "Configurations": [
       {
         "Classification": "export",
         "Properties": {
             "VARIABLE_NAME": VARIABLE_VALUE,
         }
       }
   ]
 }
]

And, On creating the emr cluster, pass the file reference to the --configurations option

aws emr create-cluster --configurations file://custom_config.json --other-options...

这篇关于如何在EMR中设置自定义环境变量以供Spark应用程序使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆