Apache Flink-"keyBy"中的异常处理 [英] Apache Flink - exception handling in "keyBy"

查看:61
本文介绍了Apache Flink-"keyBy"中的异常处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于代码中的错误或缺乏验证,进入Flink作业的数据可能会触发异常.我的目标是提供一种一致的异常处理方式,供我们的团队在Flink作业中使用,不会导致生产中断.

It may happen that data that enters Flink job triggers exception either due to bug in code or lack of validation. My goal is to provide consistent way of exception handling that our team could use within Flink jobs that won't cause any downtime in production.

  1. 重启策略似乎不适用于以下情况:

  1. Restart strategies do not seem to be applicable here as:

  • 简单的重启无法解决问题,我们陷入了重启循环
  • 我们不能简单地跳过事件
  • 它们可以解决OOME或某些暂时性问题
  • 我们无法添加自定义标签

"keyBy"功能中的try/catch块不能完全起到以下作用:

try/catch block in "keyBy" function does not fully help as:

  • 处理异常后,无法跳过"keyBy"中的事件

示例代码:

env.addSource(kafkaConsumer)
    .keyBy(keySelector) // must return one result for one entry
    .flatMap(mapFunction) // we can skip some entries here in case of errors
    .addSink(new PrintSinkFunction<>());
env.execute("Flink Application");

我希望能够跳过对导致"keyBy"问题的事件的处理,以及应该返回精确结果的类似方法.

I'd like to have ability to skip processing of event that caused issue in "keyBy" and similar methods that are supposed to return exactly one result.

推荐答案

除了@ phanhuy152的建议(对我而言似乎完全合法),为什么不在 keyBy 之前 filter ?

Beside the suggestion of @phanhuy152 (which seems totally legit to me) why not filter before keyBy?

env.addSource(kafkaConsumer)
    .filter(invalidKeys)
    .keyBy(keySelector) // must return one result for one entry
    .flatMap(mapFunction) // we can skip some entries here in case of errors
    .addSink(new PrintSinkFunction<>());
env.execute("Flink Application");

这篇关于Apache Flink-"keyBy"中的异常处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆