动态节流 flink kafka 源 [英] Dynamically throttle flink kafka sources
问题描述
我们正在使用多个 kafka 主题,但希望优先考虑其中一些(~ 服务质量).
We're consuming multiple kafka topics but want to give precedence to some of them (~ Quality of Service).
根据我在网上找到的内容,共识是不限制操作符,而是限制源代码,更具体地说是解串器 [1].
According to what I've found online, the consensus is to not throttle in operators but in the source, more specifically the deserializer [1].
我们如何在源中访问有关流媒体环境状态的信息(即主题落后当前偏移量的程度).
How can we access information about the state of the streaming environment (i.e. how far topics lag behind the current offset) in the source.
目前,我们计划将我们的整个设置转换为 CoFlatMaps [2] 并有一个控制流,它为所有主题发出当前的偏移滞后 - 低优先级流操作符然后根据高优先级流的滞后睡眠.
Currently, we plan to convert our whole setup into CoFlatMaps [2] and have a control stream that emits the current offset-lag for all topics - low precedence stream operators then sleep according to the lag of the high precedence streams.
你会如何解决这个问题?Tl;dr:有没有办法在任务管理器的源/反序列化器之间共享信息?
How would you solve this problem? Tl;dr: Is there a way to share information across sources/deserializers of a taskmanager?
推荐答案
对于需要回答这个问题的人:我在 flink 的背压方面遇到了类似的话题.我发现人们在源操作符和序列化部分中做他们的速率限制.
For people need answer for this question: I run into similar topic in backpressure for flink. I found people do their rate limit in source operator and serialize part.
有一个来自 flink github repo 的例子:https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/utils/ThrottledIterator.java
There is an example from flink github repo: https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/utils/ThrottledIterator.java
这篇关于动态节流 flink kafka 源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!