Spark提交问题 [英] Spark Submit Issue

查看:1053
本文介绍了Spark提交问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Spark提交在Spark群集上运行一个胖jar。
我在AWS上使用Spark包中的spark-ec2可执行文件创建了集群。



我用来运行jar文件的命令是

  bin / spark-submit --class edu.gatech.cse8803.main.Main --master yarn-cluster ../src1/big -data-hw2-assembly-1.0.jar 

在开始时它给我的错误是:必须至少设置一个 HADOOP_CONF_DIR YARN_CONF_DIR 环境变量。
我不知道要设置它们,所以我使用了下面的命令:
$ b $ pre> export HADOOP_CONF_DIR = / mapreduce / conf

现在错误已改为

 无法加载YARN类。 Spark的这个副本可能没有用YARN支持编译。 
运行时使用help帮助或--verbose进行调试输出

目录结构如下

  ephemeral -hdfs hadoop-native mapreduce persistent-hdfs scala spark spark-ec2 src1 tachyon 

我甚至将YARN_CONF_DIR变量设置为与HADOOP_CONF_DIR相同的值,但错误消息没有更改。我无法找到任何突出显示此问题的文档,其中大多数只是提及这两个变量,并没有提供进一步的细节。 解决方案

你需要编译针对纱线的火花来使用它。

请按照此处所述的步骤操作: https://spark.apache.org/docs/latest/building-spark.html



Maven:

  build / mvn -Pyarn -Phadoop-2.x -Dhadoop.version = 2.xx -DskipTests clean package 



SBT:

  build / sbt --Pyarn  - Phadoop-2.x程序集

您也可以在此下载预编译版本: http://spark.apache.org/downloads.html (选择为Hadoop预制)


I am trying to run a fat jar on a Spark cluster using Spark submit. I made the cluster using "spark-ec2" executable in Spark bundle on AWS.

The command I am using to run the jar file is

bin/spark-submit --class edu.gatech.cse8803.main.Main --master yarn-cluster ../src1/big-data-hw2-assembly-1.0.jar

In the beginning it was giving me the error that at least one of the HADOOP_CONF_DIR or YARN_CONF_DIR environment variable must be set. I didn't know what to set them to, so I used the following command

export HADOOP_CONF_DIR=/mapreduce/conf

Now the error has changed to

Could not load YARN classes. This copy of Spark may not have been compiled with YARN support.
Run with --help for usage help or --verbose for debug output

The home directory structure is as follows

ephemeral-hdfs  hadoop-native  mapreduce  persistent-hdfs  scala  spark  spark-ec2  src1  tachyon

I even set the YARN_CONF_DIR variable to the same value as HADOOP_CONF_DIR, but the error message is not changing. I am unable to find any documentation that highlights this issue, most of them just mention these two variables and give no further details.

解决方案

You need to compile spark against Yarn to use it.

Follow the steps explained here: https://spark.apache.org/docs/latest/building-spark.html

Maven:

build/mvn -Pyarn -Phadoop-2.x -Dhadoop.version=2.x.x -DskipTests clean package

SBT:

build/sbt -Pyarn -Phadoop-2.x assembly

You can also download a pre-compiled version here: http://spark.apache.org/downloads.html (choose a "pre-built for Hadoop")

这篇关于Spark提交问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆