Spark如何处理Spark流作业的异常? [英] How does Spark handle exceptions for a spark streaming job?

查看:310
本文介绍了Spark如何处理Spark流作业的异常?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题可能看起来确实很大,但是我有两种具体情况,总比分开解决要好.首先,我使用spark-streaming-kafka API将数据从Kafka读取到dstream中.假设我有以下两种情况之一:

This question might seem pretty large as is, however I have two specific situations which are better put together than separately. To start with, I'm reading data from Kafka into a dstream using the spark-streaming-kafka API. Assume I have one of the following two situations:

// something goes wrong on the driver
dstream.transform { rdd =>
  throw new Exception
}

// something goes wrong on the executors
dstream.transform { rdd =>
  rdd.foreachPartition { partition =>
    throw new Exception
  }
}

这通常描述了可能需要停止应用程序的某些情况-驱动程序或其中一个执行程序上引发了异常(例如,无法获得某些对处理至关重要的外部服务).如果您在本地尝试此操作,则该应用程序将立即失败.更多代码:

This typically describes some situation that might occur in which I need to stop the application - an exception is thrown either on the driver or on one of the executors (e.g. failing to reach some external service which is crucial for the processing). If you try this locally, it the app fails immediately. A bit more code:

dstream.foreachRDD { rdd =>
  // write rdd data to some output
  // update the kafka offsets
}

这是我应用程序中发生的最后一件事-将数据推送到Kafka中,然后确保在Kafka中移动偏移量,以避免重新处理.

This is the last thing that happens in my app - push the data into Kafka and then making sure to move the offsets in Kafka to avoid re-processing.

其他说明:

  • 我正在使用Marathon在Mesos之上运行Spark 2.0.1
  • 检查点和预写日志已禁用

我期望应用程序在抛出异常的情况下关闭(就像在本地运行该异常一样),因为我需要一种快速失败的行为.现在有时会发生的情况是,发生异常后,该应用程序仍显示为在Marathon中运行;更糟糕的是,尽管不再处理任何事情,在某些情况下仍然可以访问Spark UI.

I'm expecting the application to shutdown in case an exception is thrown (just as if I was running it locally) because I need a fail-fast behavior. Now what happens at times is that after an exception occurs the app still appears as running in Marathon; even worse, the Spark UI can still be accessed in some situations although nothing is processed anymore.

这可能是什么原因?

推荐答案

您的示例仅显示转换.使用Spark时,仅 actions 会引发异常,因为它们懒惰地执行了转换.我想任何将结果写到某个地方的尝试都会很快失败.

Your examples only show transformations. With Spark only actions throw exceptions because they lazily execute the transformations. I would guess any attempt to write your results somewhere will end up failing fast.

这篇关于Spark如何处理Spark流作业的异常?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆