Kafka集群中可以有成千上万个主题吗? [英] Can I have 100s of thousands of topics in a Kafka Cluster?

查看:43
本文介绍了Kafka集群中可以有成千上万个主题吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据流用例,其中我想根据每个客户存储库定义主题(可能在100,000s左右).每个数据流都是一个带有分区的主题(按照几秒钟)定义了流程的不同阶段.

I have a data flow use case where I want to have topics defined based on each of the customer repositories (which might be in the order of 100,000s) Each data flow would be a topic with partitions (in the order of a few 10s) defining the different stages of the flow.

卡夫卡适合这样的情况吗?如果没有,我将如何重塑用例以处理这种情况.同样,即使在处理过程中,每个客户存储库数据也无法与其他用户混合.

Is Kafka good for a scenario like this? If not how would I remodel my use case to handle such scenarios. Also it is the case that each customer repository data cannot be mingled with others even during processing.

推荐答案

2018年9月更新:如今,从Kafka v2.0开始,Kafka集群可以有数十万 主题的强".参见 https://blogs.apache.org/kafka/entry/apache-kafka-支持更多分区.

Update Sep 2018: Today, as of Kafka v2.0, a Kafka cluster can have hundreds of thousands of topics. See https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions.

以下是后代的初步答案:

经验法则是数字卡夫卡主题中的数千种都可以.

Jun Rao(Kafka提交人;现为 Confluent ,但他以前在LinkedIn的Kafka团队中)写道:

Jun Rao (Kafka committer; now at Confluent but he was formerly in LinkedIn's Kafka team) wrote:

在LinkedIn上,我们最大的集群包含超过2K个主题.5K主题应该没事的.[...]

At LinkedIn, our largest cluster has more than 2K topics. 5K topics should be fine. [...]

关于更多主题,您可能会遇到以下限制之一:(1)FS;(2)打开文件处理程序(我们将所有日志段在代理中保持打开状态);(3)ZK节点.

With more topics, you may hit one of those limits: (1) # dirs allowed in a FS; (2) open file handlers (we keep all log segments open in the broker); (3) ZK nodes.

卡夫卡常见问题解答给出了以下抽象准则:

The Kafka FAQ gives the following abstract guideline:

Kafka常见问题解答:我可以有几个主题?

与许多消息传递系统不同,Kafka主题旨在任意扩展.因此,我们鼓励较少的大型主题,而不是许多小型主题.因此,例如,如果我们要为用户存储通知,则我们鼓励采用按用户ID划分单个通知主题而不是每个用户单独主题的设计.

Unlike many messaging systems Kafka topics are meant to scale up arbitrarily. Hence we encourage fewer large topics rather than many small topics. So for example if we were storing notifications for users we would encourage a design with a single notifications topic partitioned by user id rather than a separate topic per user.

实际的可伸缩性在很大程度上取决于所有主题上的总分区数,而不是主题本身的数量(有关详细信息,请参见下面的问题).

The actual scalability is for the most part determined by the number of total partitions across all topics not the number of topics itself (see the question below for details).

文章 http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/(由上述的Jun Rao撰写)添加了更多详细信息,并特别关注分区数量的影响.

The article http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ (written by the aforementioned Jun Rao) adds further details, and particularly focuses on the impact of the number of partitions.

恕我直言,您的用例/模型对于单个Kafka集群来说有点麻烦,尽管通常对于Kafka不一定.在您分享的信息很少的情况下(我知道公共论坛不是进行敏感讨论的最佳场所:-P),我唯一可以提供的建议是考虑使用多个Kafka集群,因为您提到了无论如何,客户数据都必须非常隔离(包括处理步骤).

IMHO your use case / model is a bit of a stretch for a single Kafka cluster, though not necessarily for Kafka in general. With the little information you shared (I understand that a public forum is not the best place for sensitive discussions :-P) the only off-the-hip comment I can provide you with is to consider using more than one Kafka cluster because you mentioned that customer data must be very much isolated anyways (including the processing steps).

我希望这会有所帮助!

这篇关于Kafka集群中可以有成千上万个主题吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆