星火同时作业失败 [英] Spark concurrently jobs fail
问题描述
如果我有火花纱客户端上运行一个单一的工作一切正常,但在多个(> 1)同时工作我得到的容器节点上的以下异常。我使用的是1.2星火与CDH5.3和火花Jobserver
java.io.IOException异常:org.apache.spark.SparkException:无法获取broadcast_3的broadcast_3_piece0
在org.apache.spark.util.Utils $ .tryOrIOException(Utils.scala:1011)
在org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
在org.apache.spark.broadcast.TorrentBroadcast._value $ lzycompute(TorrentBroadcast.scala:64)
在org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
在org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
在org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
在org.apache.spark.scheduler.Task.run(Task.scala:56)
在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:196)
在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
在java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:615)
在java.lang.Thread.run(Thread.java:745)
org.apache.spark.SparkException:引起无法获取broadcast_3的broadcast_3_piece0
在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
在scala.Option.getOrElse(Option.scala:120)
在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
在scala.collection.immutable.List.foreach(List.scala:318)
在org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
在org.apache.spark.util.Utils $ .tryOrIOException(Utils.scala:1008)
... 11更多
15/02/02 19点20分17秒INFO executor.CoarseGrainedExecutorBackend:GOT分配的任务1
15/02/02 19点20分17秒INFO executor.Executor:在第一阶段0.0运行的任务1.0(TID 1)
15/02/02 19点20分17秒INFO broadcast.TorrentBroadcast:开始读广播变量3
15/02/02 19点20分17秒错误executor.Executor:异常的任务1.0级0.0(TID 1)
在SparkConf
检查 SparkConf.set(spark.cleaner.ttl,10000)
。这可能是你的spark.cleaner.ttl运行时间程序超出相应的价值应有的价值,今年5月发生的。只是增加了价值。它在几秒钟内给出。
欲了解更多[ http://spark.apache.org/docs/latest/configuration.html ]
If I run a single job with spark on yarn-client everything works fine, but on multiple (>1) concurrently jobs I get the following exception on the container nodes. I'm Using Spark 1.2 with CDH5.3 and Spark-Jobserver
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_3_piece0 of broadcast_3
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_3_piece0 of broadcast_3
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
... 11 more
15/02/02 19:20:17 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1
15/02/02 19:20:17 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
15/02/02 19:20:17 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 3
15/02/02 19:20:17 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
Check SparkConf.set("spark.cleaner.ttl", "10000")
in SparkConf. It may be due value in spark.cleaner.ttl your program running time exceeds the corresponding value, this may happens. Just increase the value. its given in seconds.
For more [http://spark.apache.org/docs/latest/configuration.html]
这篇关于星火同时作业失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!