如何从多分区的 Kafka 主题中按顺序(按时间戳顺序)使用数据 [英] How can I consume a data sequentially(in order of their time-stamp) from a multi-partitioned Kafka topic

查看:45
本文介绍了如何从多分区的 Kafka 主题中按顺序(按时间戳顺序)使用数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道当一个主题有多个分区时,Kafka 将无法保证数据的排序.但我的问题是:- 我需要对一个事件主题(生成事件的用户活动)进行多个分区,因为我希望多个消费者组使用该主题中的数据.但有时我需要引导整个数据,即从头到尾读取完整的数据,并从 Kafka 中的历史消息重建我的事件图,然后我失去了造成问题的排序.一种方法可能是在 Map-Reduce 范式中处理它,我根据时间映射数据并对其进行排序和使用.有没有人遇到过类似的情况/问题,并愿意以正确的方法/解决方案帮助我.

I know that Kafka will not be able to guarantee ordering of data when a topic has multiple partitions. But my problem is:- I need to have multiple partitions to an event topic(user activities generating events) since I want multiple consumer groups to consume the data from the topic. But there are times when I need to bootstrap the entire data,i.e, read the complete data right from the beginning to the end and rebuild my graph of events from the historical messages in Kafka and then I lose the ordering which is creating problem. One approach might be to process it in a Map-Reduce paradigm where I map the data based on time and order it and consume it. Is there anybody who has faced similar situation / problem and who would like to help me out with the right approach / solution.

提前致谢.

推荐答案

根据 kafka 文档,无法保证整个分区的全局排序,因此您可以创建 N 个具有 N 个消费者的分区.根据数据类型创建分区,即 A 类数据的所有类型都应放在一个分区中,因为分区内维护的消息顺序您可以在单独的消费者和进程数据中使用这些消息.

As per kafka documentation global ordering throughout partitions not guaranteed so you can create N number of partitions with N number of consumers. Create partitions based on type of data i.e. all type of data of category A should go in one partition as the order of messages maintained within partition you can consume those messages in separate consumer and process data.

我浏览了一些博客,其中说缓冲这些消息并对这些消息应用排序逻辑,但这似乎不是一个好做法,因为分区之一可能是慢消息消息在某些情况下延迟,您需要排序当每条新消息到达时,您的消息.

I gone through some blogs which saying buffer those messages and apply sorting logic on those messages, but this is not seems to be a good practice as one of partition may be slow message message is late in some cases and you need to sort your messages as and when every new message arrives.

这篇关于如何从多分区的 Kafka 主题中按顺序(按时间戳顺序)使用数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆