Apache flink:从 RocksDB 后端的保存点延迟加载 [英] Apache flink: Lazy load from save point for RocksDB backend

查看:36
本文介绍了Apache flink:从 RocksDB 后端的保存点延迟加载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们希望使用带有 RocksDB 后端 (HDFS) 的 Apache Flink 进行有状态的流处理.但是,我们的应用程序状态(键控状态)将以 TB 级为单位.

We want to use Apache Flink with RocksDB backend (HDFS) for stateful stream processing. However, our application state (keyed state) will be in the order of terabytes.

据我所知,当我们从保存点恢复作业时,所有操作员状态数据将从 HDFS 上的保存点位置传送到每个任务管理器.如果状态是TB级的,那么每次部署都会导致很长的停机时间,如果所有这些状态都需要转移.

From what I understand, when we restore a job from a savepoint, all the operator state data will be shipped from the savepoint location on HDFS to each of the task managers. If the state is in the order of terabytes, then every deployment will result in a very long downtime if all this state needs to be transferred.

我想了解,如果在 RocksDB 的情况下,可以配置延迟加载,其中键控状态在需要时从 HDFS 检索,然后缓存在本地磁盘上.

I wanted to understand, if in the case of RocksDB, it is possible to configure lazy loading, wherein keyed state is retrieved from HDFS as and when required, and then cached on the local disk.

谢谢!

推荐答案

如果您正在使用 RocksDB 并将 Flink 集群配置为使用本地恢复,您可以阅读有关 此处,然后将保留 RocksDB 文件的副本每个任务管理器的本地磁盘,几乎可以立即恢复(除了必须启动的任何新节点).

If you are using RocksDB and configure your Flink cluster to use local recovery, which you can read about here, then a copy of the RocksDB files will be kept on each task manager's local disk, and recovery will be almost immediate (except for any new nodes that have to be spun up).

然而,这并不真正适用于保存点,因为这种机制需要增量快照才能真正发挥作用.

However, this doesn't really apply to savepoints, as this mechanism requires incremental snapshots to really work well.

您可能想阅读整个文档页面,这是关于 如何配置和调整使用大量状态的应用程序.

You may want to read that whole page of the docs, which is about how to configure and tune applications that use large amounts of state.

这篇关于Apache flink:从 RocksDB 后端的保存点延迟加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆