Spark提交问题 [英] Spark Submit Issue
问题描述
我正在尝试使用Spark提交在Spark群集上运行一个胖jar。
我在AWS上使用Spark包中的spark-ec2可执行文件创建了集群。
我用来运行jar文件的命令是
bin / spark-submit --class edu.gatech.cse8803.main.Main --master yarn-cluster ../src1/big -data-hw2-assembly-1.0.jar
在开始时它给我的错误是:必须至少设置一个 HADOOP_CONF_DIR 或 YARN_CONF_DIR 环境变量。
我不知道要设置它们,所以我使用了下面的命令:
$ b $ pre> export HADOOP_CONF_DIR = / mapreduce / conf
现在错误已改为
无法加载YARN类。 Spark的这个副本可能没有用YARN支持编译。
运行时使用help帮助或--verbose进行调试输出
目录结构如下
ephemeral -hdfs hadoop-native mapreduce persistent-hdfs scala spark spark-ec2 src1 tachyon
我甚至将YARN_CONF_DIR变量设置为与HADOOP_CONF_DIR相同的值,但错误消息没有更改。我无法找到任何突出显示此问题的文档,其中大多数只是提及这两个变量,并没有提供进一步的细节。 解决方案
你需要编译针对纱线的火花来使用它。
请按照此处所述的步骤操作: https://spark.apache.org/docs/latest/building-spark.html
Maven:
build / mvn -Pyarn -Phadoop-2.x -Dhadoop.version = 2.xx -DskipTests clean package
SBT:
build / sbt --Pyarn - Phadoop-2.x程序集
您也可以在此下载预编译版本: http://spark.apache.org/downloads.html (选择为Hadoop预制)
I am trying to run a fat jar on a Spark cluster using Spark submit. I made the cluster using "spark-ec2" executable in Spark bundle on AWS.
The command I am using to run the jar file is
bin/spark-submit --class edu.gatech.cse8803.main.Main --master yarn-cluster ../src1/big-data-hw2-assembly-1.0.jar
In the beginning it was giving me the error that at least one of the HADOOP_CONF_DIR or YARN_CONF_DIR environment variable must be set. I didn't know what to set them to, so I used the following command
export HADOOP_CONF_DIR=/mapreduce/conf
Now the error has changed to
Could not load YARN classes. This copy of Spark may not have been compiled with YARN support. Run with --help for usage help or --verbose for debug output
The home directory structure is as follows
ephemeral-hdfs hadoop-native mapreduce persistent-hdfs scala spark spark-ec2 src1 tachyon
I even set the YARN_CONF_DIR variable to the same value as HADOOP_CONF_DIR, but the error message is not changing. I am unable to find any documentation that highlights this issue, most of them just mention these two variables and give no further details.
解决方案You need to compile spark against Yarn to use it.
Follow the steps explained here: https://spark.apache.org/docs/latest/building-spark.html
Maven:
build/mvn -Pyarn -Phadoop-2.x -Dhadoop.version=2.x.x -DskipTests clean package
SBT:
build/sbt -Pyarn -Phadoop-2.x assembly
You can also download a pre-compiled version here: http://spark.apache.org/downloads.html (choose a "pre-built for Hadoop")
这篇关于Spark提交问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!