Apache Flink动态添加新流 [英] Apache Flink add new stream dynamically
问题描述
是否可以在Apache Flink中在运行时动态添加新的数据流而无需重新启动Job?
Is it possible in Apache Flink, to add a new datastream dynamically during runtime without restarting the Job?
据我了解,通常的Flink程序如下所示:
As far as I understood, a usual Flink program looks like this:
val env = StreamExecutionEnvironment.getExecutionEnvironment()
val text = env.socketTextStream(hostname, port, "\n")
val windowCounts = text.map...
env.execute("Socket Window WordCount")
就我而言,例如启动新设备,因此必须处理另一个流.但是如何即时添加此新流?
In my case it is possible, that e.g. a new device is started and therefore another stream must be processed. But how to add this new stream on-the-fly?
推荐答案
无法在运行时向Flink程序添加新流.
It is not possible to add new streams at runtime to a Flink program.
解决此问题的方法是拥有一个包含所有传入事件的流(例如,您吸收所有单个流的Kafka主题).这些事件应该具有一个密钥,以标识事件来自哪个流.然后,可以使用此密钥keyBy
流并应用每个密钥处理逻辑.
The way to solve this problem is to have a stream which contains all incoming events (e.g. a Kafka topic into which you ingest all individual streams). The events should have a key identifying from which stream they come. This key can then be used to keyBy
the stream and to apply a per key processing logic.
如果要从多个套接字读取,则可以编写自己的SourceFunction
,该SourceFunction
从某些输入(例如从固定套接字)读取要打开其套接字的端口.然后,您可以在内部将所有这些套接字保持打开状态,并以循环方式从中读取它们.
If you want to read from multiple sockets, then you could write your own SourceFunction
which reads from some input (e.g. from a fixed socket) the ports to open a socket for. Then internally you could maintain all these sockets open and read in a round robin fashion from them.
这篇关于Apache Flink动态添加新流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!