Spark作业由于java.io.NotSerializableException而失败:org.apache.spark.SparkContext [英] Spark job is failed due to java.io.NotSerializableException: org.apache.spark.SparkContext

查看:2320
本文介绍了Spark作业由于java.io.NotSerializableException而失败:org.apache.spark.SparkContext的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试在 RDD [(Int,ArrayBuffer [(Int,Double)])] 输入上应用方法(ComputeDwt)时,我遇到了上述异常。
我甚至使用扩展序列化选项来序列化spark.Here中的对象是代码片段。



<$输入:系列:RDD [(Int,ArrayBuffer [(Int,Double)])]
DWTsample扩展Serialization是一个具有computeDwt函数的类。
sc:sparkContext

val kk:RDD [(Int,List [Double])] = series.map(t =>(t._1,new DWTsample()。computeDwt sc,t._2)))

错误:
org.apache.spark.SparkException:作业失败:java.io.NotSerializableException:org.apache.spark.SparkContext
org.apache.spark.SparkException:作业失败:java.io.NotSerializableException:org.apache.spark.SparkContext $ b $ org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala :760)
at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:758)
at scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray .scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
at org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ submitMissingTasks(DAGScheduler.scala:556)
at org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ submitStage(DAGScheduler.scala:503)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:361)
at org.apache。 spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ run(DAGScheduler.scala:441)
at org.apache.spark.scheduler.DAGScheduler $$ anon $ 1.run(DAGScheduler.scala: 149)

任何人都可以告诉我可能是什么问题,应该怎么做来克服这个问题?

解决方案



  series.map(t =>(t._1,new DWTsample()。computeDwt(sc,t._2)))

引用SparkContext( sc ),但SparkContext不可序列化。 SparkContext旨在公开在驱动程序上运行的操作;它不能被在worker上运行的代码引用/使用。



您必须重新构造代码,以使 sc 在映射函数关闭中未被引用。


I am facing above exception when I am trying to apply a method(ComputeDwt) on RDD[(Int,ArrayBuffer[(Int,Double)])] input. I am even using extends Serialization option to serialize objects in spark.Here is the code snippet.

input:series:RDD[(Int,ArrayBuffer[(Int,Double)])] 
DWTsample extends Serialization is a class having computeDwt function.
sc: sparkContext

val  kk:RDD[(Int,List[Double])]=series.map(t=>(t._1,new DWTsample().computeDwt(sc,t._2)))

Error:
org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext
org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:556)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:503)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:361)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)

Could anyone suggest me what could be the problem and what should be done to overcome this issue?

解决方案

The line

series.map(t=>(t._1,new DWTsample().computeDwt(sc,t._2)))

references the SparkContext (sc) but SparkContext isn't serializable. SparkContext is designed to expose operations that are run on the driver; it can't be referenced/used by code that's run on workers.

You'll have to re-structure your code so that sc isn't referenced in your map function closure.

这篇关于Spark作业由于java.io.NotSerializableException而失败:org.apache.spark.SparkContext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆