Spark中未捕获的异常处理 [英] Uncaught Exception Handling in Spark

查看:353
本文介绍了Spark中未捕获的异常处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个基于Java的Spark Streaming应用程序,该应用程序响应通过Kafka主题发出的消息.对于每条消息,应用程序都会进行一些处理,然后将结果写回到另一个Kafka主题.

I am working on a Java based Spark Streaming application which responds to messages that come through a Kafka topic. For each message, the application does some processing, and writes back the results to a different Kafka topic.

有时,由于与数据有关的意外问题,在RDD上运行的代码可能会失败并引发异常.发生这种情况时,我想拥有一个通用处理程序,该处理程序可以采取必要的操作,并向错误主题发送一条消息.现在,这些异常由Spark本身写在Spark的日志中.

Sometimes due to unexpected data related issues, the code that operates on RDDs might fail and throw an exception. When that happens, I would like to have a generic handler that could take necessary action and drop a message to an error topic. Right now, these exceptions are written in Spark's log by Spark itself.

最好的方法是什么,而不是为可在RDD上工作的每个代码块编写try-catch块?

What is the best approach to do this, instead of writing try-catch blocks for every code block that work on RDDs?

推荐答案

您可以编写一个执行此操作的通用函数.您只需要将它包装在RDD动作周围,因为那些是唯一可能引发Spark异常的动作(像.map.filter这样的转换器是由动作懒惰执行的).

You could write a generic function that does this. You only need to wrap it around RDD actions since those are the only ones that can throw Spark exceptions (transformers like .map and .filter are lazy executed by actions).

(假设这是在Scala中),您甚至可以尝试使用隐式方法.创建一个包含RDD并处理错误的类.这里是大概的草图:

(Assuming this is in Scala) You could maybe even try something with implicits. Make a class that holds an RDD and handles the error. Here a sketch of what that might look like:

implicit class FailSafeRDD[T](rdd: RDD[T]) {
  def failsafeAction[U](fn: RDD[T] => U): Try[U] = Try {
    fn(rdd)
  }
}

您可以将错误主题消息传递添加到failsafeAction中,或者在每次发生故障时想要执行的任何操作.然后用法可能是这样的:

You could add error topic messaging into failsafeAction or anything you want to do every time on failure. And then usage might be like:

val rdd = ??? // Some rdd you already have
val resultOrException = rdd.failsafeAction { r => r.count() }

但是,除此之外,我认为最佳"方法在一定程度上取决于应用程序需求.

Besides this though, I imagine the "best" approach is somewhat subjective to application needs.

这篇关于Spark中未捕获的异常处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆