崩溃时 Flink 任务管理器中的状态会发生什么? [英] What happen to state in Flink Task Manager when crash?

查看:56
本文介绍了崩溃时 Flink 任务管理器中的状态会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以知道当这个任务管理器崩溃时存储在 Flink 任务管理器中的状态会发生什么.假设状态存储是rocksdb,这些数据是否会转移到其他正在运行的任务管理器中,以便完整的状态数据准备好进行数据处理?

may i know what happen to state stored in Flink Task Manager when this Task manager crash. Say the state storage is rocksdb, would those data transfer to other running Task Manager so that complete state data is ready for data processing?

推荐答案

Flink(尚)不支持状态的动态重新缩放,因此必须恢复失败的任务管理器,并且作业将从检查点重新启动.

Flink does not (yet) support dynamic rescaling of state, so the failed task manager must be recovered, and the job will be restarted from a checkpoint.

>

具体涉及的内容取决于您的集群的配置方式,以及作业失败是因为异常还是因为运行任务管理器的机器/容器失败.

Exactly what that involves depends on how your cluster is configured, and whether the job failed because of an exception or because the machine/container running the task manager failed.

如果您使用的是 RocksDB 并且启用了本地恢复,那么如果作业因异常而死亡,则任务管理器都可以或多或少地立即从其本地状态副本重新启动作业.另一方面,如果必须启动一个新的任务管理器,那么一旦它运行,它将从最新的检查点(从使用的任何分布式文件系统)获取它需要的内容,然后工作将继续.

If you are using RocksDB and local recovery is enabled, then if the job died because of an exception, the task managers will all be able to restart the job more-or-less immediately from their local copy of the state. On the other hand, if a new task manager has to be spun up, then once it is running it will fetch what it needs from the latest checkpoint (from whatever distributed file system is used) and then the job will resume.

如果没有本地恢复,每个任务管理器都必须从 DFS 中获取检查点的相关部分.

Without local recovery, every task manager will have to fetch the relevant portions of the checkpoint from the DFS.

在某些情况下,可以做一些比完全恢复更便宜的事情.有关详细信息,请参阅细粒度恢复.

In some cases it is possible to do something less expensive than a full recovery. See fine-grained recovery for details.

这篇关于崩溃时 Flink 任务管理器中的状态会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆