Kafka流中的聚集和状态存储保留 [英] Aggregration and state store retention in kafka streams

查看：38 发布时间：2020/9/3 18:58:04 apache-kafka-streams

本文介绍了Kafka流中的聚集和状态存储保留的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下的用例.对于每个即将到来的事件，我想看看某个字段以查看其状态是否从A更改为B，如果是，则将其发送给输出主题.流程是这样的:带有键"xyz"的事件以状态A进入，一段时间后另一个事件是状态为B的键"xyz".我使用高级DSL编写了此代码.

I have a use case like the following. For each incoming event, I want to look at a certain field to see if it's status changed from A to B and if so, send that to an output topic. The flow is like this: An event with key "xyz" comes in with status A, and some time later another event comes in with key "xyz" with status B. I have this code using the high level DSL.

final KStream<String, DomainEvent> inputStream....

final KStream<String, DomainEvent> outputStream = inputStream
          .map((k, v) -> new KeyValue<>(v.getId(), v))
                    .groupByKey(Serialized.with(Serdes.String(), jsonSerde))
                    .aggregate(DomainStatusMonitor::new,
                            (k, v, aggregate) -> {
                                aggregate.updateStatusMonitor(v);
                                return aggregate;
                            }, Materialized.with(Serdes.String(), jsonSerde))
                    .toStream()
                    .filter((k, v) -> v.isStatusChangedFromAtoB())
                    .map((k,v) -> new KeyValue<>(k, v.getDomainEvent()));

是否有使用DSL编写此逻辑的更好方法?

Is there a better way to write this logic using the DSL?

与上面的代码中的聚合所创建的状态存储有关的问题.

Couple of questions regarding the state store created by the aggregation in the code above.

默认情况下是否正在创建内存状态存储?
如果我拥有无数个唯一的传入密钥，将会发生什么? 如果默认情况下使用的是内存存储，是否不需要切换到持久性存储? 我们如何处理DSL中的这种情况?
如果状态存储很大(内存中或持久性)，它将如何影响启动时间?如何使流处理等待，以使商店得到完全初始化? 还是Kafka Streams会确保在处理任何传入事件之前完全初始化状态存储?

Is it creating an in-memory state store by default?
What will happen if I have an unbounded number of unique incoming keys? If it is using an in-memory store by default, don't I need to switch to a persistent store? How do we handle situations like that in the DSL?
If the state store is very large (either in-memory or persistent), how does it affect the startup time? How can I make the stream processing to wait so that the store gets fully initialized? Or will Kafka Streams ensure that the state store is fully initialized before any incoming events are processed?

提前谢谢！

Kafka流中的聚集和状态存储保留 [英] Aggregration and state store retention in kafka streams

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Kafka流中的聚集和状态存储保留 [英] Aggregration and state store retention in kafka streams

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭