星火作业完成，但应用程序需要时间来关闭 [英] Spark jobs finishes but application takes time to close

查看：131 发布时间：2016/5/22 15:33:41 scala amazon-s3 apache-spark

本文介绍了星火作业完成，但应用程序需要时间来关闭的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用Scala运行火花的工作，预期所有作业都按时完成了，但不知何故，一些信息日志打印20-25分钟停止工作之前。

发布一些UI截图，从而有助于将已了解的问题。

以下为4个阶段所用的时间：

<醇开始=2>

以下是连续的作业ID之间的时间

我不明白为什么有这两个作业ID之间花了这么多时间。

以下是我的code片断：

  VAL SC =新SparkContext（CONF）
为（X下;  -  0到10）{
  VAL ZZ = getFilesList（LIN）;
  VAL链接= zz._1
  VAL路径= zz._2
  林= zz._3
  VAL Z = sc.textFile（links.mkString（））图（T =＆GT; t.split（'\\ t'））。滤波器（T =＆GT; T（4）==XX＆放大器;＆amp; T公司（6）==X）图（T =＆GT; titan2（T）。）过滤器（T =＆GT; t.length→35）.MAP（T =＆GT;（（T（ 34）），（T（35），叔（5），T（32），叔（33））））
  VAL way_nodes = sc.textFile（way_source）.MAP（T =＆GT; t.split（））的地图。（T =＆GT;（T（0），T（1）））;
  VAL T = z.join（way_nodes）.MAP（T =＆GT;（t._2._1._2，阵列（阵列（t._2._1._2，t._2._1._3，t._2._1 ._4，t._2._1._1，t._2._2））））reduceByKey（（T，Y）= GT;：T + Y）.MAP（T =＆GT;过程（t））的flatMap。 （T =＆GT; T）.combineByKey（createTimeCombiner，timeCombiner，timeMerger）.MAP（averagingFunction）.MAP（T =＆GT; t._1 +，+ t._2）
  t.saveAsTextFile（路径）
}
sc.stop（）

更多的一些后续：<一href=\"http://stackoverflow.com/questions/32342214/spark-1-4-1-saveastextfile-to-s3-is-very-slow-on-emr-4-0-0\">spark-1.4.1 saveAsTextFile到S3是EMR-4.0.0

很慢

解决方案

我结束了我的升级版本火花和问题就解决了。

Running spark job using scala, as expected all jobs are finishing up on time , but somehow some INFO logs are printed for 20-25 minutes before job stops.

Posting few UI screenshot which can help to undestand the problem .

Following is time taken by 4 stages :

Following is time between consecutive job ids

I dont understand why there is so much time spent in between both job ids.

Following is my code snippet:

    val sc = new SparkContext(conf)
for (x <- 0 to 10) {
  val zz = getFilesList(lin);
  val links = zz._1
  val path = zz._2
  lin = zz._3
  val z = sc.textFile(links.mkString(",")).map(t => t.split('\t')).filter(t => t(4) == "xx" && t(6) == "x").map(t => titan2(t)).filter(t => t.length > 35).map(t => ((t(34)), (t(35), t(5), t(32), t(33))))
  val way_nodes = sc.textFile(way_source).map(t => t.split(";")).map(t => (t(0), t(1)));
  val t = z.join(way_nodes).map(t => (t._2._1._2, Array(Array(t._2._1._2, t._2._1._3, t._2._1._4, t._2._1._1, t._2._2)))).reduceByKey((t, y) => t ++ y).map(t => process(t)).flatMap(t => t).combineByKey(createTimeCombiner, timeCombiner, timeMerger).map(averagingFunction).map(t => t._1 + "," + t._2)
  t.saveAsTextFile(path)
}
sc.stop()

Some more followup : spark-1.4.1 saveAsTextFile to S3 is very slow on emr-4.0.0

解决方案

I ended up upgrading my spark version and issue was resolved .

这篇关于星火作业完成，但应用程序需要时间来关闭的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

星火作业完成，但应用程序需要时间来关闭 [英] Spark jobs finishes but application takes time to close

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

星火作业完成，但应用程序需要时间来关闭 [英] Spark jobs finishes but application takes time to close

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭