Kafka Connect Distributed tasks.max 配置设置的理想值? [英] Ideal value for Kafka Connect Distributed tasks.max configuration setting?

查看:32
本文介绍了Kafka Connect Distributed tasks.max 配置设置的理想值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望生产和部署我的 Kafka Connect 应用程序.但是,我有两个关于 tasks.max 设置的问题,这是必需的且非常重要,但对于实际将此值设置为什么的细节含糊不清.

如果我有一个包含 n 个分区的主题,我希望从中使用数据并将其写入某个接收器(在我的情况下,我正在写入 S3),我应该将 tasks.max 设置为什么?我应该将它设置为n吗?我应该将其设置为 2n 吗?直觉上,我似乎想将值设置为 n,这就是我一直在做的事情.

如果我更改我的 Kafka 主题并增加该主题的分区会怎样?如果我将它设置为 n,我将不得不暂停我的 Kafka 连接器并增加 tasks.max?如果我设置了 2n 的值,那么我的连接器应该自动增加它运行的并行度吗?

解决方案

在 Kafka Connect 接收器中,任务本质上是消费者线程并接收要从中读取的分区.如果您有 10 个分区并且将 tasks.max 设置为 5,则每个任务都会接收 2 个分区来读取和跟踪偏移量.如果您已将 tasks.max 配置为高于分区计数的数字,Connect 将启动与其正在读取的主题的分区数量相等的任务.

如果您更改主题的分区计数,则必须重新启动连接任务,如果 tasks.max 仍然大于分区计数,则连接将启动那么多任务.

编辑,刚刚发现ConnectorContext:https://kafka.apache.org/0100/javadoc/org/apache/kafka/connect/connector/ConnectorContext.html

必须编写连接器以包含此内容,但看起来 Connect 能够在主题更改(添加/删除分区)时重新配置连接器.

I am looking to productionize and deploy my Kafka Connect application. However, there are two questions I have about the tasks.max setting which is required and of high importance but details are vague for what to actually set this value to.

If I have a topic with n partitions that I wish to consume data from and write to some sink (in my case, I am writing to S3), what should I set tasks.max to? Should I set it to n? Should I set it to 2n? Intuitively it seems that I'd want to set the value to n and that's what I've been doing.

What if I change my Kafka topic and increase partitions on the topic? I will have to pause my Kafka Connector and increase the tasks.max if I set it to n? If I have set a value of 2n, then my connector should automatically increase the parallelism it operates?

解决方案

In a Kafka Connect sink, the tasks are essentially consumer threads and receive partitions to read from. If you have 10 partitions and have tasks.max set to 5, each task with receive 2 partitions to read from and track the offsets. If you have configured tasks.max to a number above the partition count Connect will launch a number of tasks equal to the partitions of the topics it's reading.

If you change the partition count of the topic you'll have to relaunch your connect task, if tasks.max is still greater than the partition count, Connect will start that many tasks.

edit, just discovered ConnectorContext: https://kafka.apache.org/0100/javadoc/org/apache/kafka/connect/connector/ConnectorContext.html

The connector will have to be written to include this but it looks like Connect has the ability to reconfigure a connector if there's a topic change (partitions added/removed).

这篇关于Kafka Connect Distributed tasks.max 配置设置的理想值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆