了解Kafka主题和分区 [英] Understanding Kafka Topics and Partitions
问题描述
出于企业解决方案的目的,我开始学习Kafka.
I am starting to learn Kafka for enterprise solution purposes.
在阅读期间,我想到了一些问题:
During my readings, some questions came to my mind:
- 生产者在生成消息时-会指定要向其发送消息的 topic ,对吗?它关心分区吗?
- 订户正在运行时-是否指定其组ID,以便它可以成为同一主题或该组消费者感兴趣的多个主题的消费者集群的一部分?
-
每个消费者组在经纪人上都有对应的分区吗?还是每个消费者都有一个分区?
- When a producer is producing a message - it will specify the topic it wants to send the message to, is that right? Does it care about partitions?
- When a subscriber is running - does it specify its group id so that it can be part of a cluster of consumers of the same topic, or several topics that this group of consumers is interested in?
Does each consumer group have a corresponding partition on the broker or does each consumer have one?
是由代理创建的分区,因此对于消费者而言不是问题吗?
Are the partitions created by the broker, and therefore not a concern for the consumers?
因为这是每个分区都有偏移量的队列,所以消费者有责任指定它要读取哪些消息吗?是否需要保存其状态?
Since this is a queue with an offset for each partition, is it the responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?
从队列中删除消息会发生什么? -例如:保留时间为3小时,然后时间过去了,双方如何处理偏移量?
What happens when a message is deleted from the queue? - For example: the retention was for 3 hours, then the time passes, how is the offset being handled on both sides?
推荐答案
这篇文章已经有答案了,但是我要添加一些观点,这些观点来自《卡夫卡权威指南》
在回答每个问题之前,让我们先概述一下生产者组件:
1.当生产者正在生成消息时-它会指定要将消息发送到的主题,对吗?它在乎分区吗?
生产者将决定目标分区以放置任何消息,具体取决于:
Producer will decide target partition to place any message, depending on:
- 分区ID(如果在消息中指定的话)
- 键%num个分区,如果未提及分区ID
- 如果消息中没有分区ID 和消息键均不可用,则表示轮询,这意味着只有值可用
- Partition id, if it's specified within the message
- key % num partitions, if no partition id is mentioned
- Round robin if neither partition id nor message key are available in message, meaning only value is available
2.当订户运行时-是否指定其组ID,以便它可以成为同一主题或该组消费者感兴趣的多个主题的消费者集群的一部分?
除非您使用的是简单分配API,并且无需在Kafka中存储偏移,否则应始终配置 group.id .它不会成为任何组的一部分. 源
You should always configure group.id unless you are using the simple assignment API and you don’t need to store offsets in Kafka. It will not be a part of any group. source
3.每个消费者组在代理上都有对应的分区吗?还是每个消费者都有一个分区?
在一个消费者组中,每个分区将仅由一个消费者处理.这些是可能的情况
- 使用者数量小于主题分区数量,然后可以将多个分区分配给组中的一个使用者
- 使用者数量与主题分区的数量相同,然后分区和使用者映射如下所示,
- 使用者数量比主题分区数量高,则分区和消费者映射如下所示,无效,请检查消费者5
- Number of consumers is less than number of topic partitions then multiple partitions can be assigned to one of the consumer in the group
- Number of consumers same as number of topic partitions, then partition and consumer mapping can be like below,
- Number of consumers is higher than number of topic partitions, then partition and consumer mapping can be as seen below, Not effective, check Consumer 5
4.作为经纪人创建的分区,因此消费者不必担心吗?
如问题3所述,
消费者应了解的分区数量.
Consumer should be aware of the number of partitions, as was discussed in question 3.
5.由于这是每个分区都有偏移量的队列,使用者是否有责任指定要读取的消息?是否需要保存其状态?
Kafka(具体来说是 Group Coordinator )会通过向内部 __ consumer_offsets 主题生成一条消息来处理偏移状态通过将enable.auto.commit
设置为false
,也可以将其配置为手动.在这种情况下,consumer.commitSync()
和consumer.commitAsync()
对于管理偏移量很有帮助.
Kafka(to be specific Group Coordinator) takes care of the offset state by producing a message to an internal __consumer_offsets topic, this behavior can be configurable to manual as well by setting enable.auto.commit
to false
. In that case consumer.commitSync()
and consumer.commitAsync()
can be helpful for managing offset.
有关组协调器的更多信息:
- 这是从卡夫卡服务器一侧上的簇中选出代理之一.
- 消费者与组协调器进行交互,以获取偏移量提交和获取请求.
- 消费者定期向组协调器发送心跳.
6.从队列中删除消息后会发生什么? -例如:保留时间为3小时,然后时间过去了,双方如何处理偏移量?
如果任何使用者在保留期后开始使用,则消息将按照auto.offset.reset
配置(可能为latest/earliest
)使用.从技术上讲,它是latest
(开始处理新消息),因为到那时所有消息都已过期,并且 retention 是主题级别的配置.
If any consumer starts after retention period, messages will be consumed as per auto.offset.reset
configuration which could be latest/earliest
. technically it's latest
(start processing new messages) because all the messages got expired by that time and retention is topic level configuration.
这篇关于了解Kafka主题和分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!