没有活动时杀死Spark Streaming作业 [英] Killing spark streaming job when no activity

查看:86
本文介绍了没有活动时杀死Spark Streaming作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在一段时间内没有任何活动(即接收方未收到消息)时终止我的Spark Streaming工作.我尝试这样做

I want to kill my spark streaming job when there is no activity (i.e. the receivers are not receiving messages) for a certain time. I tried doing this

var counter = 0

myDStream.foreachRDD {
  rdd =>
    if (rdd.count() == 0L)
    {
      counter = counter + 1
      if (counter == 40) {
        ssc.stop(true, true)
      }
    } else {
      counter = 0
    }
}

是否有更好的方法?每当没有活动时,我如何使变量对所有接收者可用,并将变量更新为1?

Is there a better way of doing this? How would I make a variable available to all receivers and update the variable by 1 whenever there is no activity?

推荐答案

使用NoSQL表(例如Cassandra或HBase)来保持计数器.您无法在循环内处理流轮询.如果没有活动,则使用NoSQL或Maria DB实现相同的逻辑,并对流作业执行正常关机.我这样做的方法是,我在Maria DB中维护了一个表,用于流作业,轮询间隔为5分钟.它每5分钟访问一次数据库并写入消耗的记录计数,该方法还会返回最近时间戳记中零个记录行项目的计数.这帮助我极大地管理了流作业管理.此外,此表通常还可以帮助我o根据用shell脚本编写的逻辑

Use a NoSQL Table like Cassandra or HBase to keep the counter. You can not handle Stream Polling inside a loop. Implement same logic using NoSQL or Maria DB and perform a Graceful Shutdown to your streaming Job if no activity is happening. The way I did it was I maintained a Table in Maria DB for Streaming JOB having Polling interval of 5 mins. Every 5 mins it hits the data base and writes the count of records it consumed also the method returns what is the count of zero records line items during latest timestamp. This helped me a lot managing my Streaming Job Management. Also this table usually helps me o automatically trigger the Streaming job based on a logic written in a shell script

这篇关于没有活动时杀死Spark Streaming作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆