了解flink保存点&检查点 [英] Understanding flink savepoints & checkpoints

查看:555
本文介绍了了解flink保存点&检查点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将Apache Flink流应用程序与这样的管道一起考虑:

Considering an Apache Flink streaming-application with a pipeline like this:

Kafka-Source -> flatMap 1 -> flatMap 2 -> flatMap 3 -> Kafka-Sink

其中每个flatMap函数都是无状态运算符(例如Datastream的常规.flatMap函数).

where every flatMap function is a non-stateful operator (e.g. the normal .flatMap function of a Datastream).

万一传入的消息将在flatMap 3挂起,检查点/保存点如何工作?从flatMap 1开始重新启动后,消息将被重新处理还是跳到flatMap 3?

How do checkpoints/savepoints work, in case an incoming message will be pending at flatMap 3? Will the message be reprocessed after restart beginning from flatMap 1 or will it skip to flatMap 3?

我有点困惑,因为 ,还是在失败/重新启动后重新处理整个管道?

I am a bit confused, because the documentation seems to refer to application state as what I can use in stateful operators, but I don't have stateful operators in my application. Is the "processing progress" saved & restored at all, or will the whole pipeline be re-processed after a failure/restart?

关于我以前的问题,失败(->从检查点恢复flink)和使用保存点手动重启之间有区别吗?

And this there a difference between a failure (-> flink restores from checkpoint) and manual restart using savepoints regarding my previous questions?

我试图通过在flatMap 3中放置Thread.sleep()并通过保存点取消作业来发现自己(使用EXACTLY_ONCE和rocksdb-backend启用了检查点).但是,这导致flink命令行工具挂起,直到sleep结束,甚至在执行flatMap 3并将其发送到接收器 之前,工作被取消了.因此,似乎无法手动强制这种情况来分析flink的行为.

I tried finding out myself (with enabled checkpointing using EXACTLY_ONCE and rocksdb-backend) by placing a Thread.sleep() in flatMap 3 and then cancelling the job with a savepoint. However this lead to the flink commandline tool hanging until the sleep was over, and even then flatMap 3 was executed and even sent out to the sink before the job got cancelled. So it seems I can not manually force this situation to analyze flink's behaviour.

如果如上所述,处理进度"没有被检查点/保存点保存/覆盖,我如何确保到达我的管道的每条消息都永远不会重新使用任何给定的运算符(平面图1/2/3) -在重启/失败情况下处理?

In case "processing progress" is not saved/covered by the checkpointing/savepoints as I described above, how could I make sure for every message reaching my pipeline that any given operator (flatmap 1/2/3) is never re-processed in a restart/failure situation?

推荐答案

采用检查点时,每个任务(运算符的并行实例)都会检查其状态.在您的示例中,三个平面图运算符是无状态的,因此没有要检查的状态. Kafka源是有状态的,并检查所有分区的读取偏移量.

When a checkpoint is taken, every task (parallel instance of an operator) checkpoints its state. In your example, the three flatmap operators are stateless, so there is no state to be checkpointed. The Kafka source is stateful and checkpoints the reading offsets for all partitions.

在失败的情况下,将恢复作业,并且所有任务均会加载其状态,这意味着在源操作员的情况下,将重置读取偏移量.因此,该应用程序将重新处理自上一个检查点以来的所有事件.

In case of a failure, the job is recovered and all tasks load their state which means in case of the source operator that the reading offsets are reset. Hence, the application will reprocess all events since the last checkpoint.

为了一次实现端到端的精确性,您需要一个特殊的接收器连接器,该连接器提供事务支持(例如,针对Kafka)或支持幂等写入.

In order to achieve end-to-end exactly-once, you need a special sink connector that offers either transaction support (e.g., for Kafka) or supports idempotent writes.

这篇关于了解flink保存点&检查点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆