在Flink流中使用静态DataSet丰富DataStream [英] Enriching DataStream using static DataSet in Flink streaming

查看:228
本文介绍了在Flink流中使用静态DataSet丰富DataStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Flink流传输程序,在该程序中,我需要使用一些静态数据集(信息库IB)来丰富用户事件的DataStream.

例如假设我们有一个静态的买家数据集,并且有一个事件的传入点击流,我们希望为每个事件添加一个布尔型标志,指示事件的执行者是否是买家.

实现此目标的理想方法是按用户ID对传入流进行分区,使数据集中可用的购买者集再次按用户ID进行分区,然后针对流中的每个事件查找到此DataSet中./p>

由于Flink不允许在流式程序中使用数据集,我该如何实现以上目标?

另一种选择是使用托管操作员状态来存储购买者集合,但是我如何通过用户ID保持此状态的分布,从而避免在单个事件查找中出现网络I/O?在内存状态后端的情况下,状态是否仍通过某个键进行分配,还是在所有操作员子任务之间复制?

在Flink流媒体程序中,达到上述丰富要求的正确设计模式是什么?

解决方案

我将通过user_id键入流,并使用RichFlatMap进行扩展.在RichFlatMap的open()方法中,您可以为该用户加载静态购买者标志,并将其缓存在布尔字段中.

I am writing a Flink streaming program in which I need to enrich a DataStream of user events using some static data set (information base, IB).

For E.g. Let's say we have a static data set of buyers and we have an incoming clickstream of events, for each event we want to add a boolean flag indicating whether the doer of the event is a buyer or not.

An ideal way to achieve this would be to partition the incoming stream by user id, have the buyers set available in a DataSet partitioned again by user id and then do a look up for each event in the stream into this DataSet.

Since Flink does not allow using DataSets in a streaming program, how can I achieve the above ?

Another option could be to use Managed Operator State to store buyers set, but how can I keep this state distributed by user id so as to avoid network i/o in individual event look ups ? In case of memory state backend, does state remain distributed by some key, or is it replicated across all operator subtasks ?

What is the right design pattern to achieve the above enriching requirement in a Flink streaming program ?

解决方案

I would key the stream by user_id, and use a RichFlatMap to do the enrichment. In the open() method of the RichFlatMap you can load the static buyer flag for that user and keep it cached in a boolean field.

这篇关于在Flink流中使用静态DataSet丰富DataStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆