如何批量处理最大大小的 KStream 或回退到时间窗口? [英] How to process a KStream in a batch of max size or fallback to a time window?

查看:32
本文介绍了如何批量处理最大大小的 KStream 或回退到时间窗口?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个基于 Kafka 流的应用程序,该应用程序处理一个主题并分批接收大小为 X(即 50)的消息,但如果流的流量较低,则在 Y 秒内为我提供流中的任何内容(即5).

I would like to create a Kafka stream-based application that processes a topic and takes messages in batches of size X (i.e. 50) but if the stream has low flow, to give me whatever the stream has within Y seconds (i.e. 5).

因此,我不是一个一个地处理消息,而是处理一个 List[Record],其中列表的大小为 50(或可能更少).

So, instead of processing messages one by one, I process a List[Record] where the size of the list is 50 (or maybe less).

这是为了让一些 I/O 绑定处理更高效.

This is to make some I/O bound processing more efficient.

我知道这可以用经典的 Kafka API 来实现,但我一直在寻找一种基于流的实现,它也可以在本地处理偏移提交,并考虑到错误/失败.我在文档中或四处搜索都找不到任何相关内容,想知道是否有人可以解决此问题.

I know that this can be implemented with the classic Kafka API but was looking for a stream-based implementation that can also handle offset committing natively, taking errors/failures into account. I couldn't find anything related int he docs or by searching around and was wondering if anyone has a solution to this problem.

推荐答案

最简单的方法可能是使用有状态的 transform() 操作.每次收到唱片时,您都会将其放入商店.当您收到 50 条记录时,您进行处理、发出输出并从存储中删除记录.

The simplest way might be, to use a stateful transform() operation. Each time you receive a record, you put it into the store. When you have received 50 records, you do your processing, emit output, and delete the records from the store.

如果您在一定时间内没有阅读限制,要强制处理,您可以注册挂钟标点符号.

To enforce processing if you don't read the limit in a certain amount of time, you can register a wall-clock punctuation.

这篇关于如何批量处理最大大小的 KStream 或回退到时间窗口?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆