如何停止火花流作业? [英] How do I stop a spark streaming job?

查看:92
本文介绍了如何停止火花流作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Spark Streaming作业,该作业一直在连续运行.如何优雅地停止工作?我已经阅读了通常的建议,即在作业监视中附加一个关机钩子,然后将SIGTERM发送给该作业.

I have a Spark Streaming job which has been running continuously. How do I stop the job gracefully? I have read the usual recommendations of attaching a shutdown hook in the job monitoring and sending a SIGTERM to the job.

sys.ShutdownHookThread {
  logger.info("Gracefully stopping Application...")
  ssc.stop(stopSparkContext = true, stopGracefully = true)
  logger.info("Application stopped gracefully")
}

它似乎可以工作,但看起来并不是停止工作的最干净的方法.我在这里想念什么吗?

It seems to work but does not look like the cleanest way to stop the job. Am I missing something here?

从代码角度看,这可能很有意义,但是您如何在集群环境中使用它呢?如果我们开始执行Spark Streaming作业(将作业分配到集群中的所有节点上),我们将必须跟踪该作业的PID和运行该作业的节点.最后,当我们必须停止该过程时,我们需要跟踪作业在哪个节点上运行以及该进程的PID.我只是希望对流作业有一种更简单的作业控制方式.

From a code perspective it may make sense but how do you use this in a cluster environment? If we start a spark streaming job (we distribute the jobs on all the nodes in the cluster) we will have to keep track of the PID for the job and the node on which it was running. Finally when we have to stop the process, we need to keep track which node the job was running at and the PID for that. I was just hoping that there would be a simpler way of job control for streaming jobs.

推荐答案

您可以通过在群集模式下运行以下命令来停止流上下文,而无需发送SIGTERM.这将停止流式上下文,而无需使用线程挂钩显式停止它.

You can stop your streaming context in cluster mode by running the following command without needing to sending a SIGTERM. This will stop the streaming context without you needing to explicitly stop it using a thread hook.

$SPARK_HOME_DIR/bin/spark-submit --master $MASTER_REST_URL --kill $DRIVER_ID

-$ MASTER_REST_URL是spark驱动程序的其余URL,例如spark://localhost:6066

-$MASTER_REST_URL is the rest url of the spark driver, ie something like spark://localhost:6066

-$ DRIVER_ID类似于driver-20150915145601-0000

-$DRIVER_ID is something like driver-20150915145601-0000

如果您希望Spark正常停止应用程序,则可以在首次提交Spark应用程序时尝试设置以下系统属性(请参见

If you want spark to stop your app gracefully, you can try setting the following system property when your spark app is initially submitted (see http://spark.apache.org/docs/latest/submitting-applications.html on setting spark configuration properties).

spark.streaming.stopGracefullyOnShutdown=true

这没有正式记录,我从查看1.4源代码中收集了此信息.在独立模式下,此标志受支持.我尚未在群集模式下对其进行测试.

This is not officially documented, and I gathered this from looking at the 1.4 source code. This flag is honored in standalone mode. I haven't tested it in clustered mode yet.

我正在使用Spark 1.4.*

I am working with spark 1.4.*

这篇关于如何停止火花流作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆