了解 Kafka 主题和分区 [英] Understanding Kafka Topics and Partitions

查看:31
本文介绍了了解 Kafka 主题和分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始学习Kafka,在阅读的过程中,我想到了一些问题:

  1. 当生产者生成消息时 - 它会指定要将消息发送到的主题,对吗?它关心分区吗?

  2. 当订阅者正在运行时 - 它是否指定了它的组 ID,以便它可以成为同一主题或该组消费者感兴趣的多个主题的一组消费者的一部分?

  3. 每个消费者组在broker上有对应的分区还是每个消费者都有一个分区?

  4. 分区是否由代理创建,因此消费者不关心?

  5. 既然这是一个队列,每个分区都有一个偏移量,消费者是否有责任指定它想要读取的消息?是否需要保存状态?

  6. 从队列中删除消息时会发生什么?- 比如保留了3个小时,那么时间过去了,两边的offset是怎么处理的?

解决方案

这篇文章已经有了答案,但我正在用 Kafka 权威指南中的几张图片添加我的观点

在回答问题之前,让我们先看一下生产者组件的概述:

<块引用>

1.当生产者生成消息时 - 它会指定要将消息发送到的主题,对吗?它关心分区吗?

生产者将决定目标分区放置任何消息,取决于:

  • 分区 id,如果它在消息中指定
  • key % num partitions,如果没有提到分区 id
  • 如果partition idmessage key 在消息中都不可用,则循环表示只有值可用
<块引用>

2.当订阅者正在运行时 - 它是否指定了它的组 ID,以便它可以成为同一主题或该组消费者感兴趣的多个主题的消费者集群的一部分?

您应该始终配置 group.id 除非您使用的是简单分配 API 并且您不需要在 Kafka 中存储偏移量.它不会是任何组的一部分.来源

<块引用>

3.每个消费者组在broker上有对应的分区还是每个消费者都有一个?

在一个消费者组中,每个分区只由一个消费者处理.这些是可能的情况

  • 消费者的数量小于主题分区的数量,那么可以将多个分区分配给组中的一个消费者
  • 消费者数量相同与主题分区数量,然后分区和消费者映射可以如下所示,
  • 消费者数量高于主题分区数量,则分区和消费者映射如下所示,无效,检查消费者5
<块引用>

4.由于是broker创建的分区,所以消费者不关心?

消费者应了解分区的数量,如问题 3 中所述.

<块引用>

5.由于这是一个队列,每个分区都有一个偏移量,消费者是否有责任指定它想要读取的消息?是否需要保存状态?

Kafka(具体来说组协调器)通过向内部__consumer_offsets 主题生成消息来处理偏移状态,这种行为也可以通过将 enable.auto.commit 设置为 false 来配置为手动.在这种情况下,consumer.commitSync()consumer.commitAsync() 可以帮助管理偏移量.

更多关于小组协调员:

  1. 它是集群中从 Kafka 服务器端选出的代理之一.
  2. 消费者与组协调器交互以进行偏移提交和获取请求.
  3. 消费者定期向组协调员发送心跳.

<块引用>

6.从队列中删除消息时会发生什么?- 比如,保留了3个小时,那么时间过去了,两边的offset是怎么处理的?

如果任何消费者在保留期之后启动,消息将根据 auto.offset.reset 配置(可能是 latest/earliest)被消费.从技术上讲,它是最新(开始处理新消息),因为到那时所有消息都已过期,保留是主题级配置.

I am starting to learn Kafka, during my readings, some questions came to my mind:

  1. When a producer is producing a message - it will specify the topic it wants to send the message to, is that right? Does it care about partitions?

  2. When a subscriber is running - does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?

  3. Does each consumer group have a corresponding partition on the broker or does each consumer have one?

  4. Are the partitions created by the broker, and therefore not a concern for the consumers?

  5. Since this is a queue with an offset for each partition, is it the responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?

  6. What happens when a message is deleted from the queue? - For example, the retention was for 3 hours, then the time passes, how is the offset being handled on both sides?

解决方案

This post already has answers, but I am adding my view with a few pictures from Kafka Definitive Guide

Before answering the questions, let's look at an overview of producer components:

1. When a producer is producing a message - It will specify the topic it wants to send the message to, is that right? Does it care about partitions?

The producer will decide target partition to place any message, depending on:

  • Partition id, if it's specified within the message
  • key % num partitions, if no partition id is mentioned
  • Round robin if neither partition id nor message key is available in the message means only the value is available

2. When a subscriber is running - Does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?

You should always configure group.id unless you are using the simple assignment API and you don’t need to store offsets in Kafka. It will not be a part of any group. source

3. Does each consumer group have a corresponding partition on the broker or does each consumer have one?

In one consumer group, each partition will be processed by one consumer only. These are the possible scenarios

  • Number of consumers is less than number of topic partitions then multiple partitions can be assigned to one of the consumers in the group
  • Number of consumers same as number of topic partitions, then partition and consumer mapping can be like below,
  • Number of consumers is higher than number of topic partitions, then partition and consumer mapping can be as seen below, Not effective, check Consumer 5

4. As the partitions created by the broker, therefore not a concern for the consumers?

Consumer should be aware of the number of partitions, as was discussed in question 3.

5. Since this is a queue with an offset for each partition, is it the responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?

Kafka(to be specific Group Coordinator) takes care of the offset state by producing a message to an internal __consumer_offsets topic, this behavior can be configurable to manual as well by setting enable.auto.commit to false. In that case consumer.commitSync() and consumer.commitAsync() can be helpful for managing offset.

More about Group Coordinator:

  1. It's one of the elected brokers in the cluster from Kafka server side.
  2. Consumers interact with the Group Coordinator for offset commits and fetch requests.
  3. Consumer sends periodic heartbeats to Group Coordinator.

6. What happens when a message is deleted from the queue? - For example, The retention was for 3 hours, then the time passes, how is the offset being handled on both sides?

If any consumer starts after the retention period, messages will be consumed as per auto.offset.reset configuration which could be latest/earliest. technically it's latest(start processing new messages) because all the messages got expired by that time and retention is topic-level configuration.

这篇关于了解 Kafka 主题和分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆