Apache Flink动态添加新流 [英] Apache Flink add new stream dynamically

查看:387
本文介绍了Apache Flink动态添加新流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在Apache Flink中在运行时动态添加新的数据流而无需重新启动Job?

Is it possible in Apache Flink, to add a new datastream dynamically during runtime without restarting the Job?

据我了解,通常的Flink程序如下所示:

As far as I understood, a usual Flink program looks like this:

val env = StreamExecutionEnvironment.getExecutionEnvironment()
val text = env.socketTextStream(hostname, port, "\n")
val windowCounts = text.map...

env.execute("Socket Window WordCount")

就我而言,例如启动新设备,因此必须处理另一个流.但是如何即时添加此新流?

In my case it is possible, that e.g. a new device is started and therefore another stream must be processed. But how to add this new stream on-the-fly?

推荐答案

无法在运行时向Flink程序添加新流.

It is not possible to add new streams at runtime to a Flink program.

解决此问题的方法是拥有一个包含所有传入事件的流(例如,您吸收所有单个流的Kafka主题).这些事件应该具有一个密钥,以标识事件来自哪个流.然后,可以使用此密钥keyBy流并应用每个密钥处理逻辑.

The way to solve this problem is to have a stream which contains all incoming events (e.g. a Kafka topic into which you ingest all individual streams). The events should have a key identifying from which stream they come. This key can then be used to keyBy the stream and to apply a per key processing logic.

如果要从多个套接字读取,则可以编写自己的SourceFunction,该SourceFunction从某些输入(例如从固定套接字)读取要打开其套接字的端口.然后,您可以在内部将所有这些套接字保持打开状态,并以循环方式从中读取它们.

If you want to read from multiple sockets, then you could write your own SourceFunction which reads from some input (e.g. from a fixed socket) the ports to open a socket for. Then internally you could maintain all these sockets open and read in a round robin fashion from them.

这篇关于Apache Flink动态添加新流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆