如何从Kafka中的旧偏移点获取数据? [英] How to get data from old offset point in Kafka?

查看:40
本文介绍了如何从Kafka中的旧偏移点获取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 zookeeper 从 kafka 获取数据.在这里我总是从最后一个偏移点获取数据.有什么办法可以指定偏移时间来获取旧数据吗?

I am using zookeeper to get data from kafka. And here I always get data from last offset point. Is there any way to specify the time of offset to get old data?

有一个选项 autooffset.reset.它接受最小或最大.有人可以解释什么是最小和最大.autooffset.reset 可以帮助从旧偏移点而不是最新偏移点获取数据吗?

There is one option autooffset.reset. It accepts smallest or largest. Can someone please explain what is smallest and largest. Can autooffset.reset helps in getting data from old offset point instead of latest offset point?

推荐答案

消费者始终属于一个组,对于每个分区,Zookeeper 会跟踪该分区中该消费者组的进度.

The consumers belong always to a group and, for each partition, the Zookeeper keeps track of the progress of that consumer group in the partition.

要从头获取,可以删除Hussain提到的所有进度相关的数据

To fetch from the beginning, you can delete all the data associated with progress as Hussain refered

ZkUtils.maybeDeletePath(${zkhost:zkport}", "/consumers/${group.id}");

你也可以指定你想要的分区的偏移量,在core/src/main/scala/kafka/tools/UpdateOffsetsInZK.scala中指定

You can also specify the offset of partition you want, as specified in core/src/main/scala/kafka/tools/UpdateOffsetsInZK.scala

ZkUtils.updatePersistentPath(zkClient, topicDirs.consumerOffsetDir + "/" + partition, offset.toString)

然而,偏移量不是时间索引的,但你知道每个分区都是一个序列.

However the offset is not time indexed, but you know for each partition is a sequence.

如果您的消息包含时间戳(并且请注意,该时间戳与 Kafka 收到您的消息的那一刻无关),您可以尝试做一个索引器,尝试通过将偏移量增加 N 来逐步检索一个条目,并将元组(主题 X,第 2 部分,偏移量 100,时间戳)存储在某处.

If your message contains a timestamp (and beware that this timestamp has nothing to do with the moment Kafka received your message), you can try to do an indexer that attempts to retrieve one entry in steps by incrementing the offset by N, and store the tuple (topic X, part 2, offset 100, timestamp) somewhere.

当您想及时检索指定时刻的条目时,您可以对粗索引应用二分搜索,直到找到您想要的条目并从那里获取.

When you want to retrieve entries from a specified moment in time, you can apply a binary search to your rough index until you find the entry you want and fetch from there.

这篇关于如何从Kafka中的旧偏移点获取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆