如何为kafka主题选择分区数? [英] How to choose the no of partitions for a kafka topic?

查看:414
本文介绍了如何为kafka主题选择分区数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有3个zk节点集群和7个代理.现在,我们必须创建一个主题,并且必须为此主题创建分区.

We have 3 zk nodes cluster and 7 brokers. Now we have to create a topic and have to create partitions for this topic.

但是我没有找到任何公式来确定应该为该主题创建多少个分区. 生产者的速率为5k消息/秒,每条消息的大小为130字节.

But I did not find any formula to decide that how much partitions should I create for this topic. Rate of producer is 5k messages/sec and size of each message is 130 Bytes.

预先感谢

推荐答案

这个由Kafka联合创始人创建的旧基准非常易于理解规模的大小-

This old benchmark by Kafka co-founder is pretty nice to understand the magnitudes of scale - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

由此得出的直接结论,就像上面的Vanlightly所说的那样,消费者的处理时间是决定分区数量的最重要因素(因为您还不能挑战生产者的吞吐量).

The immediate conclusion from this, like Vanlightly said here above, is that the consumer handling time is the most important factor in deciding on number of partition (since you are not close to challenge the producer throughput).

最大的并发消耗是分区数,因此您要确保:

maximal concurrency for consuming is the number of partitions, so you want to make sure that:

((处理一条消息的时间,以秒为单位 x 每秒的消息数)/分区数) < < 1

如果它等于1,则您读取的速度不能比写入的速度快,并且这还没有提及消息的突发和使用者的失败/停机时间.因此您需要将其显着低于1,其显着程度取决于系统可以承受的延迟.

if it equals to 1, you cannot read faster than writing, and this is without mentioning bursts of messages and failures\downtime of consumers. so you will need to it to be significantly lower than 1, how significant depends on the latency that your system can endure.

这篇关于如何为kafka主题选择分区数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆