Kafka 如何为每个主题存储偏移量? [英] How does Kafka store offsets for each topic?

查看:42
本文介绍了Kafka 如何为每个主题存储偏移量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在轮询 Kafka 时,我使用 subscribe() 函数订阅了多个主题.现在,我想设置我想从每个主题读取的偏移量,而不是在每个主题的 seek()poll() 之后重新订阅.会在每个主题名称上迭代调用 seek()在轮询数据之前 获得结果吗?偏移量是如何准确存储在 Kafka 中的?

每个主题有一个分区,只有一个消费者可以读取所有主题.

解决方案

Kafka 如何为每个主题存储偏移量?

Kafka 已将偏移存储从 zookeeper 转移到 kafka brokers.原因如下:

<块引用>

Zookeeper 不是处理高写入负载(例如偏移更新)的好方法,因为 Zookeeper 将每次写入路由到每个节点,因此无法分区或以其他方式扩展写入.我们一直都知道这一点,但是因为我们已经依赖于 zk,所以选择了这种实现作为一种便利的结合".

Kafka 将偏移量提交存储在一个主题中,当消费者提交偏移量时,kafka 将提交偏移量消息发布到commit-log"主题并保留一个将组/主题/分区映射到最新偏移量的内存结构用于快速检索.更多设计信息可以在这个关于偏移管理的页面中找到.><块引用>

现在,我想设置我想从每个主题读取的偏移量,而不是在每个主题的 seek() 和 poll() 之后重新订阅.

kafka 管理工具有一个新功能可以重置偏移量.

kafka-consumer-group.sh --bootstrap-server 127.0.0.1:9092 --group您的消费者组 **--reset-offsets** --to-offset 1 --all-topics --execute

您可以使用更多选项.

While polling Kafka, I have subscribed to multiple topics using the subscribe() function. Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek() and poll() from a topic. Will calling seek() iteratively over each of the topic names, before polling for data achieve the result? How are the offsets exactly stored in Kafka?

I have one partition per topic and just one consumer to read from all topics.

解决方案

How does Kafka store offsets for each topic?

Kafka has moved the offset storage from zookeeper to kafka brokers. The reason is below:

Zookeeper is not a good way to service a high-write load such as offset updates because zookeeper routes each write though every node and hence has no ability to partition or otherwise scale writes. We have always known this, but chose this implementation as a kind of "marriage of convenience" since we already depended on zk.

Kafka store the offset commits in a topic, when consumer commit the offset, kafka publish an commit offset message to an "commit-log" topic and keep an in-memory structure that mapped group/topic/partition to the latest offset for fast retrieval. More design infomation could be found in this page about offset management.

Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek() and poll() from a topic.

There is a new feature about kafka admin tools to reset offset.

kafka-consumer-group.sh --bootstrap-server 127.0.0.1:9092 --group
      your-consumer-group **--reset-offsets** --to-offset 1 --all-topics --execute

There are more options you can use.

这篇关于Kafka 如何为每个主题存储偏移量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆