Flink 中的 RocksDBStateBackend:它究竟是如何工作的? [英] RocksDBStateBackend in Flink: how does it works exactly?

查看:114
本文介绍了Flink 中的 RocksDBStateBackend:它究竟是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了 Flink 关于状态后端的官方文档,此处.我特别对 RocksDBStateBackend 感兴趣.

I have read the official Flink's documentation about the State Backends, here. In particular, I was interested in the RocksDBStateBackend.

我不明白,如果我启用这种后端,TaskManagers 将可以通过 Flink 集群中的另一个节点访问 RocksDB?

I don't understand, if I enable this kind of backend, RocksDB will be accessible from TaskManagers through another node inside the Flink's cluster?

到目前为止我对 RocksDBStateBackend 的理解是任务管理器将状态存储在它们的内存中,即 JVM 进程的内存中.之后,他们会将状态发送到存储在 RocksDB 中吗?如果是,Flink 集群中的 RocksDB 在哪里?物理上在哪里?

What I have understood so far about the RocksDBStateBackend is that Task Managers will store the states inside their memory, i.e. the memory of the JVM process. After that, will they send the states to store inside RocksDB? If yes, where is RocksDB inside the Flink's cluster? Where is it phisically?

推荐答案

RocksDB 是一个嵌入式数据库.如果您使用 RocksDB 作为 Flink 的状态后端,那么每个任务管理器都有一个 RocksDB 的本地实例,它在 JVM 中作为本机 (JNI) 库运行.使用 RocksDB 时,您的状态作为本地磁盘上的序列化字节存在,并带有内存(堆外)缓存.

RocksDB is an embedded database. If you are using RocksDB as your state backend for Flink, then each task manager has a local instance of RocksDB, which runs as a native (JNI) library inside the JVM. When using RocksDB, your state lives as serialized bytes on the local disk, with an in-memory (off-heap) cache.

checkpoint 时,RocksDB 中的 SST 文件会从本地磁盘复制到存储 checkpoint 的分布式文件系统中.如果启用了本地恢复选项,则还会保留本地副本,以加快恢复速度.但是仅依赖本地副本是不安全的,因为如果节点发生故障,本地磁盘可能会丢失.这就是为什么检查点总是存储在分布式文件系统上的原因.

During checkpointing, the SST files from RocksDB are copied from the local disk to the distributed file system where the checkpoint is stored. If the local recovery option is enabled, then a local copy is retained as well, to speed up recovery. But it wouldn't be safe to rely only on the local copy, as the local disk might be lost if the node fails. This is why checkpoints are always stored on a distributed file system.

RocksDB 的替代方案是使用基于堆的状态后端之一,在这种情况下,您的状态将作为 JVM 堆上的对象存在.

The alternative to RocksDB is to use one of the heap-based state backends, in which case your state will live as objects on the JVM heap.

这篇关于Flink 中的 RocksDBStateBackend:它究竟是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆