如果每个主题具有单个分区,可伸缩性是否适用于Kafka流 [英] Is scalability applicable with Kafka stream if each topic has single partition

查看:115
本文介绍了如果每个主题具有单个分区,可伸缩性是否适用于Kafka流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据Kafka流文档,我的理解, 最大可能的并行任务等于集群中所有主题中一个主题的最大分区数.

My understanding as per Kafka stream documentation, Maximum possible parallel tasks is equal to maximum number of partitions of a topic among all topics in a cluster.

我在Kafka集群上有大约60个主题.每个主题只有一个分区. 我的Kafka集群是否可以通过Kafka流实现可伸缩性/并行性?

I have around 60 topics at Kafka cluster. Each topic has single partition only. Is it possible to achieve scalability/parallelism with Kafka stream for my Kafka cluster?

推荐答案

是否要对所有主题进行相同的计算?为此,我建议引入一个额外的主题,其中包含许多用于扩展的分区:

Do you want to do the same computation over all topics? For this, I would recommend to introduce an extra topic with many partitions that you use to scale out:

// using new 1.0 API
StreamsBuilder builder = new StreamsBuilder():
KStream parallelizedStream = builder
    .stream(/* subscribe to all topics at once*/)
    .through("topic-with-many-partitions");

// apply computation
parallelizedStream...

注意:在启动Streams应用程序之前,您需要手动创建带有多个分区的主题"主题

Note: You need to create the topic "topic-with-many-partitions" manually before starting your Streams application

专业提示:

具有多个分区的主题"主题的保留时间很短,因为它仅用于缩放,并且不能长期保存数据.

The topic "topic-with-many-partitions" can have a very short retention time as it's only used for scaling and must not hold data long term.

更新

如果您有10个主题T1至T10,每个主题有一个分区,则上面的程序将按以下方式执行(TN是具有10个分区的虚拟主题):

If you have 10 topic T1 to T10 with a single partitions each, the program from above will execute as follows (with TN being the dummy topic with 10 partitions):

T1-0  --+           +--> TN-0   --> T1_1
...   --+--> T0_0 --+--> ...    --> ...
T10-0 --+           +--> TN-10  --> T1_10

程序的第一部分将只读取所有10个输入主题,并将其写回到TN的10个分区中.之后,您最多可以获取10个并行任务,每个任务处理一个输入分区.如果启动10个KafakStreams实例,则只有一个实例将执行T0_0,每个实例也将运行一个T1_x.

The first part of your program will only read all 10 input topics and write it back into 10 partitions of TN. Afterwards, you can get up to 10 parallel tasks, each processing one input partition. If you start 10 KafakStreams instances, only one will execute T0_0, and each will alsa one T1_x running.

这篇关于如果每个主题具有单个分区,可伸缩性是否适用于Kafka流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆