如果每个主题都有单个分区,可扩展性是否适用于 Kafka 流 [英] Is scalability applicable with Kafka stream if each topic has single partition

查看:23
本文介绍了如果每个主题都有单个分区,可扩展性是否适用于 Kafka 流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我根据 Kafka 流文档的理解,最大可能的并行任务数等于集群中所有主题中某个主题的最大分区数.

我在 Kafka 集群中有大约 60 个主题.每个主题只有一个分区.是否可以通过 Kafka 流为我的 Kafka 集群实现可扩展性/并行性?

解决方案

您想对所有主题进行相同的计算吗?为此,我建议引入一个额外的主题,其中包含许多用于横向扩展的分区:

//使用新的 1.0 APIStreamsBuilder 构建器 = 新的 StreamsBuilder():KStream 并行化流 = 构建器.stream(/* 一次订阅所有主题*/).through("topic-with-many-partitions");//应用计算并行化流...

<块引用>

注意:您需要在启动 Streams 应用程序之前手动创建主题topic-with-many-partitions"

专业提示:

<块引用>

topic-with-many-partitions"主题的保留时间可能非常短,因为它仅用于扩展,不得长期保存数据.

更新

如果你有 10 个主题 T1 到 T10,每个主题都有一个分区,上面的程序将执行如下(TN 是有 10 个分区的虚拟主题):

T1-0 --+ +-->TN-0 -->T1_1... ---+-->T0_0 ---+-->... -->...T10-0 --+ +-->TN-10 -->T1_10

程序的第一部分只会读取所有 10 个输入主题并将其写回 TN 的 10 个分区.之后,您最多可以获得 10 个并行任务,每个任务处理一个输入分区.如果您启动 10 个 KafakStreams 实例,则只有一个会执行 T0_0,每个实例也会运行一个 T1_x.

My understanding as per Kafka stream documentation, Maximum possible parallel tasks is equal to maximum number of partitions of a topic among all topics in a cluster.

I have around 60 topics at Kafka cluster. Each topic has single partition only. Is it possible to achieve scalability/parallelism with Kafka stream for my Kafka cluster?

解决方案

Do you want to do the same computation over all topics? For this, I would recommend to introduce an extra topic with many partitions that you use to scale out:

// using new 1.0 API
StreamsBuilder builder = new StreamsBuilder():
KStream parallelizedStream = builder
    .stream(/* subscribe to all topics at once*/)
    .through("topic-with-many-partitions");

// apply computation
parallelizedStream...

Note: You need to create the topic "topic-with-many-partitions" manually before starting your Streams application

Pro Tip:

The topic "topic-with-many-partitions" can have a very short retention time as it's only used for scaling and must not hold data long term.

Update

If you have 10 topic T1 to T10 with a single partitions each, the program from above will execute as follows (with TN being the dummy topic with 10 partitions):

T1-0  --+           +--> TN-0   --> T1_1
...   --+--> T0_0 --+--> ...    --> ...
T10-0 --+           +--> TN-10  --> T1_10

The first part of your program will only read all 10 input topics and write it back into 10 partitions of TN. Afterwards, you can get up to 10 parallel tasks, each processing one input partition. If you start 10 KafakStreams instances, only one will execute T0_0, and each will alsa one T1_x running.

这篇关于如果每个主题都有单个分区,可扩展性是否适用于 Kafka 流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆