Spark-submit ClassNotFound异常 [英] Spark-submit ClassNotFound exception

查看:1622
本文介绍了Spark-submit ClassNotFound异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用这个简单示例我遇到了ClassNotFound异常的问题:

I'm having problems with a "ClassNotFound" Exception using this simple example:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

import scala.util.Marshal

class ClassToRoundTrip(val id: Int) extends scala.Serializable {
}

object RoundTripTester {

  def test(id : Int) : ClassToRoundTrip = {

    // Get the current classpath and output. Can we see simpleapp jar?
    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Executor classpath is:" + url.getFile))

    // Simply instantiating an instance of object and using it works fine.
    val testObj = new ClassToRoundTrip(id)
    println("testObj.id: " + testObj.id)

    val testObjBytes = Marshal.dump(testObj)
    val testObjRoundTrip = Marshal.load[ClassToRoundTrip](testObjBytes)  // <<-- ClassNotFoundException here
    testObjRoundTrip
  }
}

object SimpleApp {
  def main(args: Array[String]) {

    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)

    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Driver classpath is: " + url.getFile))

    val data = Array(1, 2, 3, 4, 5)
    val distData = sc.parallelize(data)
    distData.foreach(x=> RoundTripTester.test(x))
  }
}

在本地模式下,根据文档提交在第31行生成ClassNotFound例外,其中Clas sToRoundTrip对象被反序列化。奇怪的是,第28行的早期使用是可以的:

In local mode, submitting as per the docs generates a "ClassNotFound" exception on line 31, where the ClassToRoundTrip object is deserialized. Strangely, the earlier use on line 28 is okay:

spark-submit --class "SimpleApp" \
             --master local[4] \
             target/scala-2.10/simpleapp_2.10-1.0.jar

但是,如果我为driver-class-path和-jars添加额外的参数,它在本地工作正常。

However, if I add extra parameters for "driver-class-path", and "-jars", it works fine, on local.

spark-submit --class "SimpleApp" \
             --master local[4] \
             --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

但是,提交给本地开发大师仍会产生同样的问题:

However, submitting to a local dev master, still generates the same issue:

spark-submit --class "SimpleApp" \
             --master spark://localhost.localdomain:7077 \
             --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

我可以从输出中看到JAR文件是被遗嘱执行人提取。

I can see from the output that the JAR file is being fetched by the executor.

其中一个遗嘱执行人的日志在这里:

Logs for one of the executor's are here:

stdout: http://pastebin.com/raw.php?i=DQvvGhKm

stderr: http://pastebin.com/raw.php?i = MPZZVa0Q

我正在使用Spark 1.0.2。 ClassToRoundTrip包含在JAR中。
我宁愿不必在SPARK_CLASSPATH或SparkContext.addJar中硬编码值。任何人都可以帮忙吗?

I'm using Spark 1.0.2. The ClassToRoundTrip is included in the JAR. I would rather not have to hardcode values in SPARK_CLASSPATH or SparkContext.addJar. Can anyone help?

推荐答案

我有同样的问题。如果master是本地的,那么程序对大多数人来说运行良好。如果他们把它设置为(也发生在我身上)spark:// myurl:7077如果不起作用。大多数人都会收到错误,因为在执行期间未找到匿名类。它通过使用SparkContext.addJars(路径到jar)解决。

I had this same issue. If master is local then program runs fine for most people. If they set it to (also happened to me) "spark://myurl:7077" if doesn't work. Most people get error because an anonymous class was not found during execution. It is resolved by using SparkContext.addJars("Path to jar").

确保您正在执行以下操作: -


  • SparkContext.addJars(从maven创建jar的路径[hint:mvn package] )。

  • 我在代码中使用了SparkConf.setMaster( spark:// myurl:7077 )并在通过命令行提交作业时提供了相同的参数。

  • 在命令行中指定class时,请确保使用URL编写完整名称。例如:packageName.ClassName

  • 最终命令看起来应该是这样的
    bin / spark-submit --class packageName.ClassName --master spark:// myurl:7077 pathToYourJar / target / yourJarFromMaven.jar

  • SparkContext.addJars("Path to jar created from maven [hint: mvn package]").
  • I have used SparkConf.setMaster("spark://myurl:7077") in code and have supplied same has argument while submitting job to spark via command line.
  • When you specify class in command line, make sure your are writing it's complete name with URL. eg: "packageName.ClassName"
  • Final command should look like this bin/spark-submit --class "packageName.ClassName" --master spark://myurl:7077 pathToYourJar/target/yourJarFromMaven.jar

注意:此jar路径最后一点的toTourYourJar / target / yourJarFromMaven.jar也在代码中设置,如此答案的第一点。

Note: this jar pathToYourJar/target/yourJarFromMaven.jar in last point is also set in code as in first point of this answer.

这篇关于Spark-submit ClassNotFound异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆