星火同时作业失败 [英] Spark concurrently jobs fail

查看:1452
本文介绍了星火同时作业失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有火花纱客户端上运行一个单一的工作一切正常,但在多个(> 1)同时工作我得到的容器节点上的以下异常。我使用的是1.2星火与CDH5.3和火花Jobserver

  java.io.IOException异常:org.apache.spark.SparkException:无法获取broadcast_3的broadcast_3_piece0
    在org.apache.spark.util.Utils $ .tryOr​​IOException(Utils.scala:1011)
    在org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
    在org.apache.spark.broadcast.TorrentBroadcast._value $ lzycompute(TorrentBroadcast.scala:64)
    在org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
    在org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
    在org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
    在org.apache.spark.scheduler.Task.run(Task.scala:56)
    在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:196)
    在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    在java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:615)
    在java.lang.Thread.run(Thread.java:745)
org.apache.spark.SparkException:引起无法获取broadcast_3的broadcast_3_piece0
    在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
    在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
    在scala.Option.getOrElse(Option.scala:120)
    在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
    在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
    在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
    在scala.collection.immutable.List.foreach(List.scala:318)
    在org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
    在org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
    在org.apache.spark.util.Utils $ .tryOr​​IOException(Utils.scala:1008)
    ... 11更多
15/02/02 19点20分17秒INFO executor.CoarseGrainedExecutorBackend:GOT分配的任务1
15/02/02 19点20分17秒INFO executor.Executor:在第一阶段0.0运行的任务1.0(TID 1)
15/02/02 19点20分17秒INFO broadcast.TorrentBroadcast:开始读广播变量3
15/02/02 19点20分17秒错误executor.Executor:异常的任务1.0级0.0(TID 1)


在SparkConf

解决方案

检查 SparkConf.set(spark.cleaner.ttl,10000)。这可能是你的spark.cleaner.ttl运行时间程序超出相应的价值应有的价值,今年5月发生的。只是增加了价值。它在几秒钟内给出。
欲了解更多[ http://spark.apache.org/docs/latest/configuration.html ]

If I run a single job with spark on yarn-client everything works fine, but on multiple (>1) concurrently jobs I get the following exception on the container nodes. I'm Using Spark 1.2 with CDH5.3 and Spark-Jobserver

java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_3_piece0 of broadcast_3
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011)
    at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
    at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
    at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
    at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_3_piece0 of broadcast_3
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
    ... 11 more
15/02/02 19:20:17 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1
15/02/02 19:20:17 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
15/02/02 19:20:17 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 3
15/02/02 19:20:17 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)

解决方案

Check SparkConf.set("spark.cleaner.ttl", "10000") in SparkConf. It may be due value in spark.cleaner.ttl your program running time exceeds the corresponding value, this may happens. Just increase the value. its given in seconds. For more [http://spark.apache.org/docs/latest/configuration.html]

这篇关于星火同时作业失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆