丢失检查点协调器后是否可以恢复 [英] Is it possible to recover after losing the checkpoint coordinator

查看:32
本文介绍了丢失检查点协调器后是否可以恢复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 RocksDB 中使用增量检查点并将检查点保存到远程目标(在我的情况下为 S3).如果有人删除了作业管理器服务器(检查点协调器运行的地方)并重新安装它,会发生什么?通过失去检查点协调器,我也失去了从检查点恢复状态的选项?因为据我所知,协调器持有检查点的所有引用.

I'm using incremental checkpoint with RocksDB and saving the checkpoints into a remote destination(S3 in my case). What will happen if someone deletes the job manager server (where the checkpoint coordinator operates) and reinstall it? By losing the checkpoint coordinator I also lose the option to recover the state from the checkpoints? because from what I know, the coordinator holds all the references of the checkpoints.

推荐答案

如果你使用 高可用性 启用,然后 Flink 将在 ZooKeeper 中存储指向其检查点的指针.如果 JobManager 失败,Flink 将从 ZooKeeper 恢复所有检查点,并能够从最近完成的检查点恢复作业.

If you run Flink with high availability enabled, then Flink will store pointers to its checkpoints in ZooKeeper. In case of a JobManager failure, Flink will recover all checkpoints from ZooKeeper and be able to resume the jobs from the latest completed checkpoint.

这篇关于丢失检查点协调器后是否可以恢复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆