KStream批处理窗口 [英] KStream batch process windows

查看:188
本文介绍了KStream批处理窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用KStream接口批量处理邮件.

I want to batch messages with KStream interface.

我有一个包含键/值的流 我试图将它们收集在翻滚的窗口中,然后想立即处理整个窗口.

I have a Stream with Keys/values I tried to collect them in a tumbling window and then I wanted to process the complete window at once.

builder.stream(longSerde, updateEventSerde, CONSUME_TOPIC)
                .aggregateByKey(
                        HashMap::new,
                        (aggKey, value, aggregate) -> {
                            aggregate.put(value.getUuid, value);
                            return aggregate;
                        },
                        TimeWindows.of("intentWindow", 100),
                        longSerde, mapSerde)
                .foreach((wk, values) -> {

问题是每次对KTable进行更新时都会调用foreach. 一旦完成,我想处理整个窗口.如在100毫秒内收集数据,然后立即进行处理.在每个.

The thing is foreach gets called on each update to the KTable. I would like to process the whole window once it is complete. As in collect Data from 100 ms and then process at once. In for each.

16:** - windows from 2016-08-23T10:56:26 to 2016-08-23T10:56:27, key 2016-07-21T14:38:16.288, value count: 294
16:** - windows from 2016-08-23T10:56:26 to 2016-08-23T10:56:27, key 2016-07-21T14:38:16.288, value count: 295
16:** - windows from 2016-08-23T10:56:26 to 2016-08-23T10:56:27, key 2016-07-21T14:38:16.288, value count: 296
16:** - windows from 2016-08-23T10:56:26 to 2016-08-23T10:56:27, key 2016-07-21T14:38:16.288, value count: 297
16:** - windows from 2016-08-23T10:56:26 to 2016-08-23T10:56:27, key 2016-07-21T14:38:16.288, value count: 298
16:** - windows from 2016-08-23T10:56:26 to 2016-08-23T10:56:27, key 2016-07-21T14:38:16.288, value count: 299
16:** - windows from 2016-08-23T10:56:27 to 2016-08-23T10:56:28, key 2016-07-21T14:38:16.288, value count: 1
16:** - windows from 2016-08-23T10:56:27 to 2016-08-23T10:56:28, key 2016-07-21T14:38:16.288, value count: 2
16:** - windows from 2016-08-23T10:56:27 to 2016-08-23T10:56:28, key 2016-07-21T14:38:16.288, value count: 3
16:** - windows from 2016-08-23T10:56:27 to 2016-08-23T10:56:28, key 2016-07-21T14:38:16.288, value count: 4
16:** - windows from 2016-08-23T10:56:27 to 2016-08-23T10:56:28, key 2016-07-21T14:38:16.288, value count: 5
16:** - windows from 2016-08-23T10:56:27 to 2016-08-23T10:56:28, key 2016-07-21T14:38:16.288, value count: 6

有时,新窗口将从地图中的1个条目开始. 所以我什至不知道窗户什么时候满了.

at some point the new window starts with 1 entry in the map. So I don't even know when the window is full.

在kafka流中进行批处理的任何提示

any hints to to batch process in kafka streams

推荐答案

现在(从Kafka 0.10.0.0/0.10.0.1开始):您所描述的加窗行为是按预期工作".也就是说,如果您收到1,000条传入消息,则(当前)将始终看到最新版本的Kafka/Kafka Streams向下游进行1,000条更新.

Right now (as of Kafka 0.10.0.0 / 0.10.0.1): The windowing behavior you are describing is "working as expected". That is, if you are getting 1,000 incoming messages, you will (currently) always see 1,000 updates going downstream with the latest versions of Kafka / Kafka Streams.

展望未来:Kafka社区正在开发新功能,以使此更新速率行为更加灵活(例如,允许您将上述行为描述为所需的行为).参见 KIP- 63:统一流中的存储和下游缓存以获取更多详细信息.

Looking ahead: The Kafka community is working on new features to make this update-rate behavior more flexible (e.g. to allow for what you described above as your desired behavior). See KIP-63: Unify store and downstream caching in streams for more details.

这篇关于KStream批处理窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆