Spark作业失败,因为它找不到hadoop core-site.xml [英] Spark job fails because it can't find the hadoop core-site.xml

查看:728
本文介绍了Spark作业失败,因为它找不到hadoop core-site.xml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行Spark作业,并且在尝试启动驱动程序时遇到此错误:

I'm trying to run a spark job and I'm getting this error when I try to start the driver:

16/05/17 14:21:42 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/var/lib/mesos/slave/slaves/0c080f97-9ef5-48a6-9e11-cf556dfab9e3-S1/frameworks/5c37bb33-20a8-4c64-8371-416312d810da-0002/executors/driver-20160517142123-0183/runs/802614c4-636c-4873-9379-b0046c44363d/core-site.xml does not exist.
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at com.spark.test.SparkJobRunner.main(SparkJobRunner.java:56)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我已经在mesos集群中的几台服务器上运行了spark(不确定那是正确的,但这就是我正在做的事情),我也在这些服务器上运行了hadoop.我在一台服务器上启动了spark主服务器,然后又在另一台服务器上启动了spark从属服务器.我有3个应用程序,没关系,但是我有一个UI,用户可以在其中启动Spark作业,将这些作业放入kafka队列中,我有一个启动器应用程序,它使用SparkLauncher创建Spark作业(请参见下面的代码),然后我有我的spark驱动程序,该驱动程序连接到kafka队列,然后处理从UI发送来的请求. UI和启动器正在马拉松中运行.如上所述,Spark是集群上的自己的进程,驱动程序连接到Spark以运行作业. 我已经将hdfs-site.xml,core-site.xml和spark-env.sh上传到hadoop,并在我的spark上下文中指向它们:

I have spark running on several servers that are a part of my mesos cluster (not sure that's right but that's what I'm doing) I also have hadoop running on these servers. I started the spark master on one server and then started the spark slaves on the other servers. I have 3 apps, not that it matters, but I have a UI, where the user can kick off spark jobs, it puts the jobs in a kafka queue, I have the a launcher app that creates the spark job using the SparkLauncher (see the code below) and then I have my spark driver which connects to the kafka queue and then processes requests sent in from the UI. The UI and launcher are running in marathon. Spark as stated above is it's own process on the cluster and the driver connects to spark to run the jobs. I have uploaded hdfs-site.xml, core-site.xml and spark-env.sh to hadoop and point to them in my spark context:

SparkConf conf = new SparkConf()
                .setAppName(config.getString(SPARK_APP_NAME))
                .setMaster(sparkMaster)
                .setExecutorEnv("HADOOP_USER_NAME", config.getString(HADOOP_USER, ""))
                .set("spark.mesos.uris", "<hadoop node>:9000/config/core-site.xml,<hadoop node>:9000/config/hdfs-site.xml") 
                .set("spark.files", "core-site.xml,hdfs-site.xml,spark-env.sh") 
                .set("spark.mesos.coarse", "true")
                .set("spark.cores.max", config.getString(SPARK_CORES_MAX))
                .set("spark.driver.memory", config.getString(SPARK_DRIVER_MEMORY))
                .set("spark.driver.extraJavaOptions", config.getString(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, ""))
                .set("spark.executor.memory", config.getString(SPARK_EXECUTOR_MEMORY))
                .set("spark.executor.extraJavaOptions", config.getString(SPARK_EXECUTOR_EXTRA_JAVA_OPTIONS))
                .set("spark.executor.uri", hadoopPath);

以下是启动驱动程序的代码:

Here is the code that launches the driver:

SparkLauncher launcher = new SparkLauncher()
            .setMaster(<my spark/mesos master>)
            .setDeployMode("cluster")
            .setSparkHome("/home/spark")
            .setAppResource(<hdfs://path/to/a/spark.jar>)
            .setMainClass(<my main class>);
handle = launcher.startApplication();

我确定我做错了什么,只是无法弄清楚是什么.我是刚接触火花,hadoop和mesos的新手,所以请随时指出我做错的任何其他事情.

I'm sure I'm doing something wrong I just can't figure out what. I'm new to spark, hadoop and mesos, so feel free to point out anything else I'm doing wrong.

推荐答案

我的问题是我没有在群集中每台服务器上的$ SPARK_HOME/spark-env.sh中设置HADOOP_CONF_DIR.设置好之后,我便可以开始执行我的Spark工作.我还意识到我不需要在SparkConf中包含core-site.xml,hdfs-site.xml或spark-env.sh文件,因此我删除了设置"spark.files"的行.

My problem was that I hadn't set the HADOOP_CONF_DIR in $SPARK_HOME/spark-env.sh on each server in my cluster. Once I set that I was able to get my spark job to start correctly. I also realized I didn't need to include the core-site.xml, hdfs-site.xml or spark-env.sh files in the SparkConf so I removed the line that set "spark.files"

这篇关于Spark作业失败,因为它找不到hadoop core-site.xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆