在Amazon EMR 4.0.0,设置/etc/spark/conf/spark-env.conf无效 [英] on Amazon EMR 4.0.0, setting /etc/spark/conf/spark-env.conf is ineffective

查看:225
本文介绍了在Amazon EMR 4.0.0,设置/etc/spark/conf/spark-env.conf无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发起在亚马逊电子病历,其中有一个额外的类路径依赖我基于火花hiveserver2。由于亚马逊EMR这​​个错误:

I'm launching my spark-based hiveserver2 on Amazon EMR, which has an extra classpath dependency. Due to this bug in Amazon EMR:

https://petz2000.word$p$pss.com/2015/08/18/get-blas-working-with-spark-on-amazon-emr/

我的类路径中无法通过提交--driver类路径选项

My classpath cannot be submitted through "--driver-class-path" option

所以我一定到修改/etc/spark/conf/spark-env.conf添加额外的类路径:

So I'm bounded to modify /etc/spark/conf/spark-env.conf to add the extra classpath:

# Add Hadoop libraries to Spark classpath
SPARK_CLASSPATH="${SPARK_CLASSPATH}:${HADOOP_HOME}/*:${HADOOP_HOME}/../hadoop-hdfs/*:${HADOOP_HOME}/../hadoop-mapreduce/*:${HADOOP_HOME}/../hadoop-yarn/*:/home/hadoop/git/datapassport/*"

其中/家/ Hadoop的/的git / datapassport / *是我的类路径中。

where "/home/hadoop/git/datapassport/*" is my classpath.

不过成功启动服务器后,星火环境参数显示,我的改变是无效的:

However after launching the server successfully, the Spark environment parameter shows that my change is ineffective:

spark.driver.extraClassPath :/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*

这是配置文件过时了吗?哪里是新的文件,以及如何解决这个问题?

Is this configuration file obsolete? Where is the new file and how to fix this problem?

推荐答案

您可以使用--driver类路径。

You can use the --driver-classpath.

从一个全新的EMR集群的主节点上启动火花壳。

Start a spark-shell on the master node from a fresh EMR cluster.

spark-shell --master yarn-client
scala> sc.getConf.get("spark.driver.extraClassPath")
res0: String = /etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*

使用--bootstrap行动的JAR文件添加到EMR集群。

Add your JAR files to the EMR cluster using a --bootstrap-action.

当你调用火花提交prePEND(或追加)的JAR文件extraClassPath的价值,你从火花壳

When you call spark-submit prepend (or append) your JAR files to the value of extraClassPath you got from spark-shell

spark-submit --master yarn-cluster --driver-classpath /home/hadoop/my-custom-jar.jar:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*

本使用EMR发布版本4.1和4.2为我工作。

This worked for me using EMR release builds 4.1 and 4.2.

建筑spark.driver.extraClassPath可能版本之间变化,这可能是为什么SPARK_CLASSPATH不工作的原因的过程。

The process for building spark.driver.extraClassPath may change between releases, which may be the reason why SPARK_CLASSPATH doesn't work anymore.

这篇关于在Amazon EMR 4.0.0,设置/etc/spark/conf/spark-env.conf无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆