我可以在 Kafka 集群中拥有数百个主题吗? [英] Can I have 100s of thousands of topics in a Kafka Cluster?

查看:106
本文介绍了我可以在 Kafka 集群中拥有数百个主题吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据流用例,我想根据每个客户存储库(可能是 100,000 个)定义主题,每个数据流都是一个带有分区的主题(按照一个几个 10 秒)定义流程的不同阶段.

I have a data flow use case where I want to have topics defined based on each of the customer repositories (which might be in the order of 100,000s) Each data flow would be a topic with partitions (in the order of a few 10s) defining the different stages of the flow.

Kafka 适合这样的场景吗?如果不是,我将如何改造我的用例来处理这些场景.此外,即使在处理过程中,每个客户存储库数据也不能与其他存储库数据混合.

Is Kafka good for a scenario like this? If not how would I remodel my use case to handle such scenarios. Also it is the case that each customer repository data cannot be mingled with others even during processing.

推荐答案

2021 年 3 月更新: Kafka 新的 KRaft 模式将 ZooKeeper 从 Kafka 的架构中完全移除,Kafka 集群可以处理数百万个主题/分区.见 https://www.confluent.io/blog/kafka-without-zookeeper-a-sneak-peek/ 了解详情.

Update March 2021: With Kafka's new KRaft mode, which entirely removes ZooKeeper from Kafka's architecture, a Kafka cluster can handle millions of topics/partitions. See https://www.confluent.io/blog/kafka-without-zookeeper-a-sneak-peek/ for details.

*Kafka Raft 元数据模式"的缩写;自 Kafka v2.8 起抢先体验

2018 年 9 月更新:从 Kafka v2.0 开始,Kafka 集群可以有数十万个主题.参见 https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions.

Update September 2018: As of Kafka v2.0, a Kafka cluster can have hundreds of thousands of topics. See https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions.

以下初步回答:

经验法则是 Kafka 主题的数量可以达到数千个.

Jun Rao(Kafka 提交者;现在在 Confluent,但他以前在 LinkedIn 的 Kafka 团队工作)写道:

Jun Rao (Kafka committer; now at Confluent but he was formerly in LinkedIn's Kafka team) wrote:

在 LinkedIn,我们最大的集群拥有超过 2000 个主题.5K主题应该没事的.[...]

At LinkedIn, our largest cluster has more than 2K topics. 5K topics should be fine. [...]

对于更多主题,您可能会遇到以下限制之一:(1) # dirs allowed in aFS;(2) 打开文件处理程序(我们在代理中保持所有日志段打开);(3) ZK节点.

With more topics, you may hit one of those limits: (1) # dirs allowed in a FS; (2) open file handlers (we keep all log segments open in the broker); (3) ZK nodes.

Kafka FAQ 给出了以下抽象指南:

The Kafka FAQ gives the following abstract guideline:

Kafka 常见问题解答:我可以拥有多少个主题?

与许多消息传递系统不同,Kafka 主题旨在任意扩展.因此,我们鼓励较少的大主题而不是许多小主题.因此,例如,如果我们为用户存储通知,我们会鼓励使用按用户 ID 划分的单个通知主题而不是每个用户单独的主题的设计.

Unlike many messaging systems Kafka topics are meant to scale up arbitrarily. Hence we encourage fewer large topics rather than many small topics. So for example if we were storing notifications for users we would encourage a design with a single notifications topic partitioned by user id rather than a separate topic per user.

实际的可扩展性在很大程度上取决于所有主题的总分区数,而不是主题本身的数量(有关详细信息,请参阅下面的问题).

The actual scalability is for the most part determined by the number of total partitions across all topics not the number of topics itself (see the question below for details).

文章http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/(由上述饶君撰写)进一步补充细节,特别关注分区数量的影响.

The article http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ (written by the aforementioned Jun Rao) adds further details, and particularly focuses on the impact of the number of partitions.

恕我直言,您的用例/模型对于单个 Kafka 集群来说有点牵强,尽管一般来说不一定适用于 Kafka.由于您分享的信息很少(我知道公共论坛不是进行敏感讨论的最佳场所 :-P),我可以为您提供的唯一即兴评论是考虑使用多个 Kafka 集群,因为您提到过无论如何,客户数据必须非常隔离(包括处理步骤).

IMHO your use case / model is a bit of a stretch for a single Kafka cluster, though not necessarily for Kafka in general. With the little information you shared (I understand that a public forum is not the best place for sensitive discussions :-P) the only off-the-hip comment I can provide you with is to consider using more than one Kafka cluster because you mentioned that customer data must be very much isolated anyways (including the processing steps).

我希望这会有所帮助!

这篇关于我可以在 Kafka 集群中拥有数百个主题吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆