Apache Flink 动态添加新流 [英] Apache Flink add new stream dynamically
问题描述
是否可以在 Apache Flink 中在运行时动态添加新的数据流而无需重新启动作业?
Is it possible in Apache Flink, to add a new datastream dynamically during runtime without restarting the Job?
据我所知,一个普通的 Flink 程序是这样的:
As far as I understood, a usual Flink program looks like this:
val env = StreamExecutionEnvironment.getExecutionEnvironment()
val text = env.socketTextStream(hostname, port, "\n")
val windowCounts = text.map...
env.execute("Socket Window WordCount")
就我而言,这是可能的,例如启动了一个新设备,因此必须处理另一个流.但是如何即时添加这个新流?
In my case it is possible, that e.g. a new device is started and therefore another stream must be processed. But how to add this new stream on-the-fly?
推荐答案
无法在运行时向 Flink 程序添加新流.
It is not possible to add new streams at runtime to a Flink program.
解决这个问题的方法是拥有一个包含所有传入事件的流(例如,您将所有单个流摄取到其中的 Kafka 主题).事件应该有一个键来标识它们来自哪个流.然后,此密钥可用于 keyBy
流并应用每个密钥的处理逻辑.
The way to solve this problem is to have a stream which contains all incoming events (e.g. a Kafka topic into which you ingest all individual streams). The events should have a key identifying from which stream they come. This key can then be used to keyBy
the stream and to apply a per key processing logic.
如果您想从多个套接字读取数据,那么您可以编写自己的 SourceFunction
,它从某些输入(例如,从固定套接字)读取要为其打开套接字的端口.然后在内部您可以保持所有这些套接字打开并以循环方式从它们读取.
If you want to read from multiple sockets, then you could write your own SourceFunction
which reads from some input (e.g. from a fixed socket) the ports to open a socket for. Then internally you could maintain all these sockets open and read in a round robin fashion from them.
这篇关于Apache Flink 动态添加新流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!