数据流 - 状态持久性? [英] Dataflow - State persistence?

查看:41
本文介绍了数据流 - 状态持久性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在考虑使用 Beam/Dataflow 进行状态处理,但有点担心状态后端的可见性有限.如果内存不足,状态如何保留在磁盘上?任何底层数据库?

We are considering using Beam/Dataflow for stateful processing, but a bit concerned about the limited visibility on the state backend. How are states persisted on disk in case memory does not suffice ? Any underlying database ?

我在 2021 年的活动 [1] 期间听说过 Windmill,但 2019 年的票 [2] 指的是 Persistent Disk.

I heard about Windmill during a 2021 event [1], but a ticket from 2019 [2] refers to Persistent Disk.

谢谢!

[1] https://beamcollege.dev/
[2] 使用哪种持久化存储通过 Dataflow 保持使用 Apache Beam Timers 实现的持久状态?

推荐答案

Windmill 和磁盘上的持久存储是一回事.Windmill 将管道状态存储在永久磁盘上.

Windmill and persistent storage on disk are the same thing. Windmill stores pipeline state on Persistent Disks.

Windmill 是在流式数据流作业中在用户虚拟机上运行的进程.负责在worker之间进行流式shuffle,持久化和维护pipeline状态的一致性.

Windmill is a process running on user VMs in streaming Dataflow jobs. It is responsible for performing the streaming shuffle between workers, persisting and maintaining the consistency of pipeline state.

(非公开来源)

您可以在此堆栈上找到更多详细信息.

You can find more details on this stack.

这篇关于数据流 - 状态持久性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆