在 Flink 流中使用静态 DataSet 丰富 DataStream [英] Enriching DataStream using static DataSet in Flink streaming

查看:35
本文介绍了在 Flink 流中使用静态 DataSet 丰富 DataStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个 Flink 流程序,其中我需要使用一些静态数据集(信息库,IB)来丰富用户事件的 DataStream.

I am writing a Flink streaming program in which I need to enrich a DataStream of user events using some static data set (information base, IB).

例如假设我们有一个买家的静态数据集,我们有一个传入的事件点击流,对于每个事件,我们想要添加一个布尔标志,指示事件的执行者是否是买家.

For E.g. Let's say we have a static data set of buyers and we have an incoming clickstream of events, for each event we want to add a boolean flag indicating whether the doer of the event is a buyer or not.

实现此目的的理想方法是按用户 ID 对传入流进行分区,让数据集中的买方设置再次按用户 ID 进行分区,然后在此数据集中查找流中的每个事件.

An ideal way to achieve this would be to partition the incoming stream by user id, have the buyers set available in a DataSet partitioned again by user id and then do a look up for each event in the stream into this DataSet.

由于 Flink 不允许在流式程序中使用 DataSets,我该如何实现上述功能?

Since Flink does not allow using DataSets in a streaming program, how can I achieve the above ?

另一种选择可能是使用托管运营商状态来存储买家集,但我如何保持此状态按用户 ID 分发,以避免在单个事件查找中进行网络输入/输出?在内存状态后端的情况下,状态是由某个键保持分布,还是在所有操作员子任务中复制?

Another option could be to use Managed Operator State to store buyers set, but how can I keep this state distributed by user id so as to avoid network i/o in individual event look ups ? In case of memory state backend, does state remain distributed by some key, or is it replicated across all operator subtasks ?

在 Flink 流程序中实现上述丰富需求的正确设计模式是什么?

What is the right design pattern to achieve the above enriching requirement in a Flink streaming program ?

推荐答案

我会通过 user_id 对流进行键控,并使用 RichFlatMap 进行充实.在 RichFlatMap 的 open() 方法中,您可以为该用户加载静态买家标志并将其缓存在布尔字段中.

I would key the stream by user_id, and use a RichFlatMap to do the enrichment. In the open() method of the RichFlatMap you can load the static buyer flag for that user and keep it cached in a boolean field.

这篇关于在 Flink 流中使用静态 DataSet 丰富 DataStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆