是否可以创建具有动态分区计数的kafka主题? [英] Is it possible to create a kafka topic with dynamic partition count?

查看:74
本文介绍了是否可以创建具有动态分区计数的kafka主题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用kafka将网站用户的页面访问事件流式传输到分析服务.每个事件将包含以下有关消费者的详细信息:

I am using kafka to stream the events of page visits by the website users to an analytics service. Each event will contain the following details for the consumer:

  • 用户ID
  • 用户的IP地址

我需要非常高的吞吐量,因此我决定使用分区键userId-ipAddress对主题进行分区 即

I need very high throughput, so I decided to partition the topic with partition key as userId-ipAddress ie

对于userId 1000和ip地址10.0.0.1,事件将具有 分区键为"1000-10.0.0.1"

For a userId 1000 and ip address 10.0.0.1, the event will have partition key as "1000-10.0.0.1"

在这种情况下,分区键是动态的,因此在创建主题时预先指定分区数. 是否可以在kafka中创建具有动态分区计数的主题?

In this use case the partition key is dynamic, so specifying the number of partitions upfront while creating the topic. Is it possible to create topic in kafka with dynamic partition count?

使用这种分区是一种好习惯吗?或者还有其他方法可以实现吗?

Is it a good practice to use this kind of partitioning or Is there any other way this can be achieved?

推荐答案

无法创建具有动态分区计数的Kafka主题.创建主题时,必须指定分区数.您可以稍后使用复制工具手动进行更改. .

It's not possible to create a Kafka topic with dynamic partition count. When you create a topic you have to specify the number of partitions. You can change it later manually using Replication Tools.

但是我不明白为什么首先需要动态分区计数.分区键与分区数无关.您可以将分区键用于十个分区或一千个分区.当您向Kafka主题发送消息时,Kafka必须将其发送到特定分区.每个分区都通过其ID(仅是一个数字)来标识. Kafka计算出这样的结果

But I don't understand why do you need dynamic partition count in the first place. The partition key is not related to the number of partitions. You can use your partition key with ten partitions or with thousand partitions. When you send a message to Kafka topic, Kafka must send it to a specific partition. Every partition is identify by it's ID which is simply a number. Kafka computes something like this

partition_id = hash(partition_key) % number_of_partition

,它将消息发送到分区partition_id.如果您的用户数远远超过分区数,则应该可以.更多建议:

and it sends the message to partition partition_id. If you have far more users than partitions you should be OK. More suggestions:

  • 使用userId作为分区键.您可能不需要IP地址作为分区键的一部分.到底有什么好处呢?通常,您需要来自单个用户的所有消息才能在单个分区中结束.如果您将IP地址用作分区键,则来自单个用户的消息可能会在多个分区中结束.我不知道您的用例,但总的来说,这不是您想要的.
  • 测量需要多少个分区来处理所有消息.然后创建比其他分区多十倍的分区.您可以创建超出实际需要的分区.卡夫卡不会介意,也没有性能方面的损失.请参阅如何选择Kafka集群中的主题/分区数是多少?
  • Use userId as a partition key. You probably don't need IP address as a part of partition key. What is it good for? Typically you need all messages from a single user to end up in a single partition. If you have IP address as a partition key then the messages from a single user could end up in multiple partitions. I don't know your use case but it general that's not what you want.
  • Measure how many partitions you need to process all messages. Then create let's say ten times more partitions. You can create more partitions than you actually need. Kafka won't mind and there are no performance penalties. See How to choose the number of topics/partitions in a Kafka cluster?

现在,您应该能够处理系统中的所有消息.如果流量增加,则可以添加更多的Kafka代理,并且可以使用复制工具来更改分区的领导者/副本.如果流量增长十倍以上,则必须创建新分区.

Right now you should be able to process all messages in your system. If traffic grows you can add more Kafka brokers and you can use Replication tools to change leaders/replicas for partitions. If the traffic grows more than ten times you must create new partitions.

这篇关于是否可以创建具有动态分区计数的kafka主题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆