kafka可以连接-mongo源作为集群运行(最大任务> 1) [英] Can kafka connect - mongo source run as cluster (max.tasks > 1)
问题描述
我正在使用以下 mongo-source 由kafka-connect支持.我发现mongo源的一种配置(来自此处)是 tasks.max .
I'm using the following mongo-source which is supported by kafka-connect. I found that one of the configurations of the mongo source (from here) is tasks.max.
这意味着我可以提供> 1的连接器task.max,但是我不了解它在后台会做什么?
this means I can provide the connector tasks.max which is > 1, but I fail to understand what it will do behind the scene?
如果它将创建多个连接器来监听mongoDb更改流,那么我将收到重复的消息.那么,mongo-source是否真的具有并行性并且可以作为集群工作?如果任务数超过1个,该怎么办?
If it will create multiple connectors to listen to mongoDb change stream, then I will end up with duplicate messages. So, does mongo-source really has parallelism and works as a cluster? what does it do if it has more then 1 tasks.max?
推荐答案
Mongo源不支持task.max>1.即使您将其设置为大于1,只有一个任务会将数据从mongo提取到Kafka.
Mongo-source doesn't support tasks.max > 1. Even if you set it greater than 1 only one task will be pulling data from mongo to Kafka.
创建多少个任务取决于特定的连接器.函数 List< Map< String,String>>Connector :: taskConfigs(int maxTasks)
,(在实现连接器时应重写),该列表将决定任务的数量.如果您检查mongo-kafka源连接器,您将看到它是singletonList.
How many task is created depends on particular connector. Function List<Map<String, String>> Connector::taskConfigs(int maxTasks)
, (that should be overridden during the implementation of your connector) return the list, which size determine number of Tasks.
If you check mongo-kafka source connector you will see, that it is singletonList.