崩溃时Flink任务管理器中的状态如何? [英] What happen to state in Flink Task Manager when crash?

查看:258
本文介绍了崩溃时Flink任务管理器中的状态如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以知道当此任务管理器崩溃时,存储在Flink任务管理器中的状态如何变化.假设状态存储是rocksdb,这些数据是否会传输到其他正在运行的任务管理器,以便完整的状态数据已准备好进行数据处理?

may i know what happen to state stored in Flink Task Manager when this Task manager crash. Say the state storage is rocksdb, would those data transfer to other running Task Manager so that complete state data is ready for data processing?

推荐答案

Flink尚不支持状态的动态重缩放,因此必须恢复失败的任务管理器,并且作业将从检查点重新启动.

Flink does not (yet) support dynamic rescaling of state, so the failed task manager must be recovered, and the job will be restarted from a checkpoint.

确切涉及的内容取决于集群的配置方式,以及作业是否由于异常而失败,还是因为运行任务管理器的机器/容器失败.

Exactly what that involves depends on how your cluster is configured, and whether the job failed because of an exception or because the machine/container running the task manager failed.

如果您使用的是RocksDB,并且启用了本地恢复,则如果作业由于异常而终止,任务管理器都将或多或少立即从其本地状态副本重新启动作业.另一方面,如果必须启动新的任务管理器,则一旦它运行,它将从最新的检查点(从使用的任何分布式文件系统中)获取所需的内容,然后作业将恢复.

If you are using RocksDB and local recovery is enabled, then if the job died because of an exception, the task managers will all be able to restart the job more-or-less immediately from their local copy of the state. On the other hand, if a new task manager has to be spun up, then once it is running it will fetch what it needs from the latest checkpoint (from whatever distributed file system is used) and then the job will resume.

没有本地恢复,每个任务管理器都必须从DFS中获取检查点的相关部分.

Without local recovery, every task manager will have to fetch the relevant portions of the checkpoint from the DFS.

在某些情况下,可以做比完全恢复便宜的事情.有关详细信息,请参见细粒度恢复.

In some cases it is possible to do something less expensive than a full recovery. See fine-grained recovery for details.

这篇关于崩溃时Flink任务管理器中的状态如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆