如何从多分区的Kafka主题按顺序(按时间戳顺序)使用数据 [英] How can I consume a data sequentially(in order of their time-stamp) from a multi-partitioned Kafka topic

查看:243
本文介绍了如何从多分区的Kafka主题按顺序(按时间戳顺序)使用数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道,当一个主题具有多个分区时,Kafka将无法保证数据的排序.但是我的问题是:-我需要对事件主题进行多个分区(用户活动生成事件),因为我希望多个使用者组使用该主题中的数据.但是有时候我需要重新引导整个数据,即从头到尾读取完整的数据并根据Kafka中的历史消息重建事件图,然后我失去了创建问题的顺序.一种方法可能是在Map-Reduce范式中对其进行处理,在该范式中,我根据时间映射数据并对其进行排序和使用.是否有人遇到过类似的情况/问题,并且想为我提供正确的方法/解决方案.

I know that Kafka will not be able to guarantee ordering of data when a topic has multiple partitions. But my problem is:- I need to have multiple partitions to an event topic(user activities generating events) since I want multiple consumer groups to consume the data from the topic. But there are times when I need to bootstrap the entire data,i.e, read the complete data right from the beginning to the end and rebuild my graph of events from the historical messages in Kafka and then I lose the ordering which is creating problem. One approach might be to process it in a Map-Reduce paradigm where I map the data based on time and order it and consume it. Is there anybody who has faced similar situation / problem and who would like to help me out with the right approach / solution.

谢谢.

推荐答案

根据kafka文档,不保证整个分区的全局排序,因此您可以使用N个使用者使用N个分区.根据数据类型创建分区,即,类别A的所有类型的数据都应放在一个分区中,因为在分区中维护的消息顺序可以在单独的使用者和过程数据中使用这些消息.

As per kafka documentation global ordering throughout partitions not guaranteed so you can create N number of partitions with N number of consumers. Create partitions based on type of data i.e. all type of data of category A should go in one partition as the order of messages maintained within partition you can consume those messages in separate consumer and process data.

我浏览了一些博客,其中说缓冲这些消息并在这些消息上应用排序逻辑,但这似乎不是一个好习惯,因为在某些情况下,其中一个分区可能会很慢,因此消息消息延迟了,您需要对消息进行排序当每条新消息到达时您的消息.

I gone through some blogs which saying buffer those messages and apply sorting logic on those messages, but this is not seems to be a good practice as one of partition may be slow message message is late in some cases and you need to sort your messages as and when every new message arrives.

这篇关于如何从多分区的Kafka主题按顺序(按时间戳顺序)使用数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆