使用flink从多个broker kafka读取 [英] Reading from multiple broker kafka with flink

查看:104
本文介绍了使用flink从多个broker kafka读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 flink 读取多个 kafka.

I want to read multiple kafka from flink.

我有 3 台用于 kafka 的计算机集群.有以下主题

I have a cluser of 3 computers for kafka. With the following topic

Topic:myTopic   PartitionCount:3    ReplicationFactor:1 Configs:
Topic: myTopic  Partition: 0    Leader: 2   Replicas: 2 Isr: 2
Topic: myTopic  Partition: 1    Leader: 0   Replicas: 0 Isr: 0
Topic: myTopic  Partition: 2    Leader: 1   Replicas: 1 Isr: 1

从 Flink 我执行以下代码:

From Flink I execute the following code :

Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "x.x.x.x:9092,x.x.x.x:9092,x.x.x.x:9092");
properties.setProperty("group.id", "flink");

DataStream<T> stream = env.addSource(new FlinkKafkaConsumer09<>("myTopic", new SimpleStringSchema(), properties)
stream.map(....)
env.execute()

我启动了 3 次相同的工作.

I launch 3 times the same job.

如果我使用一个代理执行此代码,它运行良好,但使用 3 个 Broker(在 3 台不同的机器上)只读取一个分区.

If I execute this code with one broker it's work well but with 3 broke (on 3 different machine) only one partition is read.

(在这个问题中) 提出的解决方案是

为每个集群创建单独的 FlinkKafkaConsumer 实例(这是你已经在做的),然后合并结果流

to create separate instances of the FlinkKafkaConsumer for each cluster (that's what you are already doing), and then union the resulting streams

这对我来说不起作用.

所以我的问题是:

  1. 我是否遗漏了什么?
  2. 如果我们在 Kafka 集群中有一台新计算机,我们是否需要更改 flink 的代码来为新的 borker 添加消费者?或者我们可以在运行时自动处理这个问题吗?

推荐答案

您似乎误解了 Kafka 分布式流的概念.

It seems you've misunderstood the concept of Kafka's distributed streams.

Kafka 主题由几个分区组成(在您的情况下为 3 个).每个消费者可以使用这些分区中的一个或多个.如果您使用相同的 group.id 启动应用程序的 3 个实例,则每个使用者确实只会从一个代理读取数据 -它会尝试平均分配负载,因此每个消费者只有一个分区.

Kafka topic consists of several partitions (3 in your case). Each consumer can consume one or more of these partitions. If you start 3 instances of your app with the same group.id, each consumer will indeed read data from just one broker – it tries to distribute the load evenly so it's one partition per consumer.

我建议阅读有关此主题的更多信息,尤其是关于消费者群体的概念Kafka 文档.

I recommend to read more about this topic, especially about the concept of consumer groups in Kafka documentation.

无论如何,FlinkKafkaConsumer09 可以运行在多个并行实例中,每个实例都会从一个或多个 Kafka 分区中提取数据.您无需担心创建更多的使用者实例.一个消费者实例可以从所有分区中提取记录.

Anyway FlinkKafkaConsumer09 can run in multiple parallel instances, each of which will pull data from one or more Kafka partitions. You don't need to worry about creating more instances of the consumer. One instance of consumer can pull records from all of the partitions.

我不知道为什么你开始工作 3 次而不是一次,并行度设置为 3.这将解决你的问题.

I have no idea why you're starting the job 3 times instead of once with parallelism set to 3. That would solve your problem.

DataStream<T> stream =
      env.addSource(new FlinkKafkaConsumer09<>("myTopic", new SimpleStringSchema(), properties))
              .setParallelism(3);

这篇关于使用flink从多个broker kafka读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆