是否可以创建具有动态分区计数的 kafka 主题? [英] Is it possible to create a kafka topic with dynamic partition count?

查看:24
本文介绍了是否可以创建具有动态分区计数的 kafka 主题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 kafka 将网站用户的页面访问事件流式传输到分析服务.每个事件将包含消费者的以下详细信息:

I am using kafka to stream the events of page visits by the website users to an analytics service. Each event will contain the following details for the consumer:

  • 用户名
  • 用户的IP地址

我需要非常高的吞吐量,所以我决定使用分区键将主题分区为 userId-ipAddress

I need very high throughput, so I decided to partition the topic with partition key as userId-ipAddress ie

对于 userId 1000 和 ip 地址 10.0.0.1,事件将有分区键为1000-10.0.0.1"

For a userId 1000 and ip address 10.0.0.1, the event will have partition key as "1000-10.0.0.1"

在这个用例中,分区键是动态的,因此在创建主题时预先指定分区数.是否可以使用动态分区计数在 kafka 中创建主题?

In this use case the partition key is dynamic, so specifying the number of partitions upfront while creating the topic. Is it possible to create topic in kafka with dynamic partition count?

使用这种分区是一个好习惯还是有其他方法可以实现?

Is it a good practice to use this kind of partitioning or Is there any other way this can be achieved?

推荐答案

无法创建具有动态分区计数的 Kafka 主题.创建主题时,您必须指定分区数.您可以稍后使用 Replication Tools 手动更改它.

It's not possible to create a Kafka topic with dynamic partition count. When you create a topic you have to specify the number of partitions. You can change it later manually using Replication Tools.

但我不明白为什么你首先需要动态分区计数.分区键与分区数无关.您可以将分区键用于十个分区或千个分区.当您向 Kafka 主题发送消息时,Kafka 必须将其发送到特定分区.每个分区都由它的 ID 标识,它只是一个数字.Kafka 是这样计算的

But I don't understand why do you need dynamic partition count in the first place. The partition key is not related to the number of partitions. You can use your partition key with ten partitions or with thousand partitions. When you send a message to Kafka topic, Kafka must send it to a specific partition. Every partition is identify by it's ID which is simply a number. Kafka computes something like this

partition_id = hash(partition_key) % number_of_partition

并将消息发送到分区 partition_id.如果您的用户数远多于分区,您应该没问题.更多建议:

and it sends the message to partition partition_id. If you have far more users than partitions you should be OK. More suggestions:

  • 使用 userId 作为分区键.您可能不需要 IP 地址作为分区键的一部分.到底有什么好处呢?通常,您需要来自单个用户的所有消息都位于单个分区中.如果您将 IP 地址作为分区键,那么来自单个用户的消息可能会出现在多个分区中.我不知道你的用例,但一般来说这不是你想要的.
  • 测量处理所有消息所需的分区数.然后创建比方说多十倍的分区.您可以创建比实际需要更多的分区.卡夫卡不会介意,也没有性能损失.见如何选择Kafka 集群中的主题/分区数量?
  • Use userId as a partition key. You probably don't need IP address as a part of partition key. What is it good for? Typically you need all messages from a single user to end up in a single partition. If you have IP address as a partition key then the messages from a single user could end up in multiple partitions. I don't know your use case but it general that's not what you want.
  • Measure how many partitions you need to process all messages. Then create let's say ten times more partitions. You can create more partitions than you actually need. Kafka won't mind and there are no performance penalties. See How to choose the number of topics/partitions in a Kafka cluster?

现在您应该能够处理系统中的所有消息.如果流量增长,您可以添加更多 Kafka 代理,并且您可以使用复制工具更改分区的领导者/副本.如果流量增长超过十倍,您必须创建新分区.

Right now you should be able to process all messages in your system. If traffic grows you can add more Kafka brokers and you can use Replication tools to change leaders/replicas for partitions. If the traffic grows more than ten times you must create new partitions.

这篇关于是否可以创建具有动态分区计数的 kafka 主题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆