Flink:在 Flink 作业中处理异常的最佳方法是什么 [英] Flink: what's the best way to handle exceptions inside Flink jobs

查看:118
本文介绍了Flink:在 Flink 作业中处理异常的最佳方法是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 flink 工作,它接收 Kafaka 主题并通过一堆操作符.我想知道处理中间发生的异常的最佳方法是什么.

I have a flink job that takes in Kafaka topics and goes through a bunch of operators. I'm wondering what's the best way to deal with exceptions that happen in the middle.

我的目标是有一个集中的地方来处理那些可能从不同操作员抛出的异常,这是我目前的解决方案:

My goal is to have a centralized place to handle those exceptions that may be thrown from different operators and here is my current solution:

在catch块中使用ProcessFunction并将sideOutput输出到context,假设有异常,并且有一个单独的sink函数用于sideOutput 最后调用外部服务来更新另一个相关作业的状态

Use ProcessFunction and output sideOutput to context in the catch block, assuming there is an exception, and have a separate sink function for the sideOutput at the end where it calls an external service to update the status of another related job

但是,我的问题是,通过这样做,我似乎仍然需要调用 collector.collect() 并传入一个空值,以便继续执行以下运算符并达到最后一个阶段 collector.collect()code>sideOutput 将流入单独的接收器函数.这是正确的做法吗?

However, my question is that by doing so it seems I still need to call collector.collect() and pass in a null value in order to proceed to following operators and hit last stage where sideOutput will flow into the separate sink function. Is this the right way to do it?

另外,我不确定如果我不在操作符内部调用 collector.collect() 实际会发生什么,它会挂在那里并导致内存泄漏吗?

Also I'm not sure what actually happens if I don't call collector.collect() inside a operator, would it hang there and cause memory leak?

推荐答案

不调用 collector.collect() 就好.当您使用侧输出捕获异常时,您不需要使用空值调用 collect() - 每个操作员都可以有自己的侧输出.最后,如果你有多个这样的操作符带有异常的侧输出,你可以在将该流发送到接收器之前union()将侧输出一起.

It's fine to not call collector.collect(). And you don't need to call collect() with a null value when you use the side output to capture the exception - each operator can have its own side output. Finally, if you have multiple such operators with a side output for exceptions, you can union() the side outputs together before sending that stream to a sink.

如果由于某种原因下游运算符需要知道存在异常,那么一种方法是输出 Either,但随后每个下游运算符都会当然需要有代码来检查它收到的内容.

If for some reason the downstream operator(s) need to know that there was an exception, then one approach is to output an Either<good result, Exception>, but then each downstream operator would of course need to have code to check what it's receiving.

这篇关于Flink:在 Flink 作业中处理异常的最佳方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆