在 Flink 中的操作员之间共享状态 [英] Share state among operators in Flink

查看:25
本文介绍了在 Flink 中的操作员之间共享状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道 Flink 中是否可以在操作员之间共享状态.

I wonder if it is possible in Flink to share the state among operators.

例如,我在操作符上按键进行分区,并且我需要在分区 C 内有一个分区 A 的状态(出于任何原因)(图 1.a),或者我需要下游操作符 F 中操作符 C 的状态(图 1.b).

Say, for instance, that I have partitioning by key on an operator and I need a piece of state of partition A inside partition C (for any reason) (fig 1.a), or I need the state of operator C in downstream operator F (fig 1.b).

我知道可以将记录广播到所有分区.因此,如果您在记录中包含运算符的内部状态,则可以与下游运算符共享您的内部状态.
然而,这可能是一个昂贵的操作,而不是简单地让 op1 特别要求 op2 状态.

I know it is possible to broadcast records to all partitions. So, if you include the internal state of an operator inside the records, you can share your internal state with downstream operators.
However, this could be an expensive operation instead of simply letting op1 specifically ask for op2 state.

最近关于可查询状态的发展是朝着这个概念发展还是只是为了让外部用户查询拓扑的内部状态?

Are the recent developments around queryable state moving towards this concept or they are meant only to let an external user query the internal state of the topology?

提前感谢您的见解

推荐答案

总的来说,Flink 的设计不允许读取或写入相同或不同算子的其他子任务的状态.正如您所说,您可以使用 broadcast 使状态全局可用.可查询状态功能用于外部用户查询.

In general, Flink's design does not allow to read from or write to state of other subtasks of the same or different operators. As you said, you can use broadcast to make state globally available. The queryable state features is intended for external user queries.

但是,我听说有用户在运算符中利用此功能从同一作业的其他运算符获取数据.我不知道它的效果如何(稳定性和性能方面).如果您想尝试一下,我会指向用户邮件列表进行更深入的技术讨论.

However, I heard of users who leveraged this features in an operator to fetch data from other operators of the same job. I don't know how well this works (stability and performance-wise). I would point you to the user mailing list for a more in-depth technical discussion if you would like to try this out.

这篇关于在 Flink 中的操作员之间共享状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆