Kafka-Connect:在分布式模式下创建新连接器就是创建新组 [英] Kafka-Connect: Creating a new connector in distributed mode is creating new group

查看:22
本文介绍了Kafka-Connect:在分布式模式下创建新连接器就是创建新组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用 confluent 3.0.1 平台.我正在尝试在两个不同的工作人员上创建 2 个连接器,但尝试创建一个新连接器正在为其创建一个新组.

I am currently working with confluent 3.0.1 platform. I am trying to create 2 connectors on two different workers but trying to create a new connector is creating a new group for it.

Two connectors were created using below details:

1) POST http://devmetric.com:8083/connectors

{
    "name": "connector1",
    "config": {
        "connector.class": "com.xxx.kafka.connect.sink.DeliverySinkConnector",
        "tasks.max": "1",
        "topics": "dev.ps_primary_delivery",
        "elasticsearch.cluster.name": "ad_metrics_store",
        "elasticsearch.hosts": "devkafka1.com:9300",
        "elasticsearch.bulk.size": "100",
        "tenants": "tenant1"
    }
}

2) POST http://devkafka01.com:8083/connectors

{
    "name": "connector2",
    "config": {
        "connector.class": "com.xxx.kafka.connect.sink.DeliverySinkConnector",
        "tasks.max": "1",
        "topics": "dev.ps_primary_delivery",
        "elasticsearch.cluster.name": "ad_metrics_store",
        "elasticsearch.hosts": "devkafka.com:9300",
        "elasticsearch.bulk.size": "100",
        "tenants": "tenant1"
    }
}

但是它们都是在不同的组 ID 下创建的.在此之后,我查询了现有的组.

But both of them were created under different group id. After this i queried on the existing groups.

$ sh ./bin/kafka-consumer-groups --bootstrap-server devmetric.com:9091  --new-consumer  --list

Result was:
connect-connector2
connect-connector1

这些组是Kafka connect自动创建的,不是我给的.我在 worker.properties 中给出了不同的 group.id.但我希望两个连接器都在同一个组下,以便它们并行工作以共享消息.截至目前,我在主题dev.ps_primary_delivery"上有 100 万条数据,我希望两个连接器各获得 50 万条数据.

These groups was created by Kafka connect automatically and was not given by me. I had given different group.id in worker.properties. But I wanted both connectors to be under same group so that they work parallel to share the messages.As of now I have 1 million data on a topic "dev.ps_primary_delivery" and I want both connector to get 0.5 million each.

请告诉我该怎么做.

推荐答案

我认为需要澄清一下...

I think some clarification is required...

    worker.properties 文件中的
  1. group.id 不是指消费者组.它是一个工作组"——同一个工作组中的多个工作人员将在他们之间分配工作——所以如果同一个连接器有很多任务(例如 JDBC 连接器对每个表都有一个任务),这些任务将分配给所有组中的工人.

  1. group.id in the worker.properties file does not refer to consumer groups. It is a "worker group" - multiple workers in the same worker group will split work between them - so if the same connector has many tasks (for example the JDBC connector has a task for every table), those tasks will be allocated to all workers in the group.

接收器连接器确实有属于消费者组的消费者.该组的 group.id 始终为connect-"+连接器名称.在您的情况下,您会根据连接器名称获得connect-connector1"和connect-connector2".这也意味着两个连接器在同一组中的唯一方式是...如果它们具有相同的名称.但是名称是唯一的,因此同一组中不能有两个连接器.原因是...

Sink connectors do have consumers that are part of a consumer group. The group.id of this group is always "connect-"+connector name. In your case, you got "connect-connector1" and "connect-connector2" based on your connector names. This also means that the only way two connectors will be in the same group is... if they have the same name. But names are unique, so you can't have two connectors in the same group. The reason is...

连接器本身并没有真正获得事件,它们只是启动一堆任务.每个任务都有属于连接器消费者组的消费者,每个任务将独立处理主题和分区的子集.因此,在同一组中有两个连接器,基本上意味着他们的所有任务都属于同一组 - 那么为什么需要两个连接器?只需为该连接器配置更多主题和更多任务即可.

Connectors don't really get events themselves, they just start a bunch of tasks. Each of the tasks has consumers that are part of the connector consumer group and each task will handle a subset of the topics and partitions independently. So having two connectors in the same group, basically means that all their tasks are part of the same group - so why do you need two connectors? Just configure more topics and more tasks for that one connector and you are all set.

唯一的例外是您使用的连接器未正确使用任务或将您限制为仅执行一项任务.在这种情况下 - 他们要么有充分的理由,要么(更有可能)有人需要改进他们的连接器......

The only exception is if the connector you are using doesn't use tasks correctly or limits you to just one task. In that case - either they have a good reason or (more likely) someone needs to improve their connector...

这篇关于Kafka-Connect:在分布式模式下创建新连接器就是创建新组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆