了解 flink 保存点 &检查站 [英] Understanding flink savepoints & checkpoints

查看:53
本文介绍了了解 flink 保存点 &检查站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一个带有如下管道的 Apache Flink 流应用程序:

Considering an Apache Flink streaming-application with a pipeline like this:

Kafka-Source -> flatMap 1 -> flatMap 2 -> flatMap 3 -> Kafka-Sink

其中每个 flatMap 函数都是无状态操作符(例如,Datastream 的普通 .flatMap 函数).

where every flatMap function is a non-stateful operator (e.g. the normal .flatMap function of a Datastream).

检查点/保存点如何工作,以防传入消息在 flatMap 3 处待处理?从flatMap 1开始重新启动后消息会被重新处理还是会跳到flatMap 3?

How do checkpoints/savepoints work, in case an incoming message will be pending at flatMap 3? Will the message be reprocessed after restart beginning from flatMap 1 or will it skip to flatMap 3?

我有点困惑,因为documentation 似乎将应用程序状态称为我可以在有状态运算符中使用的内容,但我的应用程序中没有有状态运算符.是否保存了处理进度"&完全恢复,还是会在失败/重启后重新处理整个管道?

I am a bit confused, because the documentation seems to refer to application state as what I can use in stateful operators, but I don't have stateful operators in my application. Is the "processing progress" saved & restored at all, or will the whole pipeline be re-processed after a failure/restart?

关于我之前的问题,失败(-> flink 从检查点恢复)和使用保存点手动重启之间有区别吗?

And this there a difference between a failure (-> flink restores from checkpoint) and manual restart using savepoints regarding my previous questions?

我尝试通过在 flatMap 3 中放置 Thread.sleep() 来找出自己(使用 EXACTLY_ONCE 和 Rocksdb-backend 启用检查点)> 然后使用保存点取消作业.然而,这导致 flink 命令行工具挂起,直到 sleep 结束,即使这样 flatMap 3 被执行,甚至发送到接收器之前作业被取消.所以看来我不能手动强制这种情况来分析flink的行为.

I tried finding out myself (with enabled checkpointing using EXACTLY_ONCE and rocksdb-backend) by placing a Thread.sleep() in flatMap 3 and then cancelling the job with a savepoint. However this lead to the flink commandline tool hanging until the sleep was over, and even then flatMap 3 was executed and even sent out to the sink before the job got cancelled. So it seems I can not manually force this situation to analyze flink's behaviour.

如果我上面描述的检查点/保存点没有保存/覆盖处理进度",我如何确保每条消息到达我的管道,任何给定的操作符(平面图 1/2/3)永远不会重新-在重启/失败情况下处理?

In case "processing progress" is not saved/covered by the checkpointing/savepoints as I described above, how could I make sure for every message reaching my pipeline that any given operator (flatmap 1/2/3) is never re-processed in a restart/failure situation?

推荐答案

当采取检查点时,每个任务(算子的并行实例)都会检查其状态.在您的示例中,三个 flatmap 运算符是无状态的,因此没有要检查点的状态.Kafka 源是有状态的,并检查所有分区的读取偏移量.

When a checkpoint is taken, every task (parallel instance of an operator) checkpoints its state. In your example, the three flatmap operators are stateless, so there is no state to be checkpointed. The Kafka source is stateful and checkpoints the reading offsets for all partitions.

如果发生故障,作业会恢复,所有任务都会加载它们的状态,这意味着在源操作符的情况下,读取偏移量会被重置.因此,应用程序将重新处理自上次检查点以来的所有事件.

In case of a failure, the job is recovered and all tasks load their state which means in case of the source operator that the reading offsets are reset. Hence, the application will reprocess all events since the last checkpoint.

为了实现端到端的恰好一次,您需要一个特殊的接收器连接器来提供事务支持(例如,用于 Kafka)或支持幂等写入.

In order to achieve end-to-end exactly-once, you need a special sink connector that offers either transaction support (e.g., for Kafka) or supports idempotent writes.

这篇关于了解 flink 保存点 &检查站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆