星火作业完成,但应用程序需要时间来关闭 [英] Spark jobs finishes but application takes time to close

查看:131
本文介绍了星火作业完成,但应用程序需要时间来关闭的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Scala运行火花的工作,预期所有作业都按时完成了,但不知何故,一些信息日志打印20-25分钟停止工作之前。

发布一些UI截图,从而有助于将已了解的问题。


  1. 以下为4个阶段所用的时间:

<醇开始=2>
  • 以下是连续的作业ID之间的时间

  • 我不明白为什么有这两个作业ID之间花了这么多时间。

    以下是我的code片断:

      VAL SC =新SparkContext(CONF)
    为(X下; - 0到10){
      VAL ZZ = getFilesL​​ist(LIN);
      VAL链接= zz._1
      VAL路径= zz._2
      林= zz._3
      VAL Z = sc.textFile(links.mkString())图(T =&GT; t.split('\\ t'))。滤波器(T =&GT; T(4)==XX&放大器;&amp; T公司(6)==X)图(T =&GT; titan2(T)。)过滤器(T =&GT; t.length→35).MAP(T =&GT;((T( 34)),(T(35),叔(5),T(32),叔(33))))
      VAL way_nodes = sc.textFile(way_source).MAP(T =&GT; t.split())的地图。(T =&GT;(T(0),T(1)));
      VAL T = z.join(way_nodes).MAP(T =&GT;(t._2._1._2,阵列(阵列(t._2._1._2,t._2._1._3,t._2._1 ._4,t._2._1._1,t._2._2))))reduceByKey((T,Y)= GT;:T + Y).MAP(T =&GT;过程(t))的flatMap。 (T =&GT; T).combineByKey(createTimeCombiner,timeCombiner,timeMerger).MAP(averagingFunction).MAP(T =&GT; t._1 +,+ t._2)
      t.saveAsTextFile(路径)
    }
    sc.stop()

    更多的一些后续:<一href=\"http://stackoverflow.com/questions/32342214/spark-1-4-1-saveastextfile-to-s3-is-very-slow-on-emr-4-0-0\">spark-1.4.1 saveAsTextFile到S3是EMR-4.0.0

    很慢
    解决方案

    我结束了我的升级版本火花和问题就解决了​​。

    Running spark job using scala, as expected all jobs are finishing up on time , but somehow some INFO logs are printed for 20-25 minutes before job stops.

    Posting few UI screenshot which can help to undestand the problem .

    1. Following is time taken by 4 stages :

    1. Following is time between consecutive job ids

    I dont understand why there is so much time spent in between both job ids.

    Following is my code snippet:

        val sc = new SparkContext(conf)
    for (x <- 0 to 10) {
      val zz = getFilesList(lin);
      val links = zz._1
      val path = zz._2
      lin = zz._3
      val z = sc.textFile(links.mkString(",")).map(t => t.split('\t')).filter(t => t(4) == "xx" && t(6) == "x").map(t => titan2(t)).filter(t => t.length > 35).map(t => ((t(34)), (t(35), t(5), t(32), t(33))))
      val way_nodes = sc.textFile(way_source).map(t => t.split(";")).map(t => (t(0), t(1)));
      val t = z.join(way_nodes).map(t => (t._2._1._2, Array(Array(t._2._1._2, t._2._1._3, t._2._1._4, t._2._1._1, t._2._2)))).reduceByKey((t, y) => t ++ y).map(t => process(t)).flatMap(t => t).combineByKey(createTimeCombiner, timeCombiner, timeMerger).map(averagingFunction).map(t => t._1 + "," + t._2)
      t.saveAsTextFile(path)
    }
    sc.stop()
    

    Some more followup : spark-1.4.1 saveAsTextFile to S3 is very slow on emr-4.0.0

    解决方案

    I ended up upgrading my spark version and issue was resolved .

    这篇关于星火作业完成,但应用程序需要时间来关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆