使用Flink从多个经纪人Kafka读取 [英] Reading from multiple broker kafka with flink

查看:1132
本文介绍了使用Flink从多个经纪人Kafka读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从flink读取多个kafka.

I want to read multiple kafka from flink.

我为kafka拥有3台计算机的集群.带有以下主题

I have a cluser of 3 computers for kafka. With the following topic

Topic:myTopic   PartitionCount:3    ReplicationFactor:1 Configs:
Topic: myTopic  Partition: 0    Leader: 2   Replicas: 2 Isr: 2
Topic: myTopic  Partition: 1    Leader: 0   Replicas: 0 Isr: 0
Topic: myTopic  Partition: 2    Leader: 1   Replicas: 1 Isr: 1

在Flink中,我执行以下代码:

From Flink I execute the following code :

Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "x.x.x.x:9092,x.x.x.x:9092,x.x.x.x:9092");
properties.setProperty("group.id", "flink");

DataStream<T> stream = env.addSource(new FlinkKafkaConsumer09<>("myTopic", new SimpleStringSchema(), properties)
stream.map(....)
env.execute()

我完成了3次相同的工作.

I launch 3 times the same job.

如果我使用一个代理执行此代码,则效果很好,但如果3次中断(在3台不同的计算机上),则只能读取一个分区.

If I execute this code with one broker it's work well but with 3 broke (on 3 different machine) only one partition is read.

(在此问题中)提出的解决方案是

为每个集群创建FlinkKafkaConsumer的单独实例(这就是您已经在做的),然后合并生成的流

to create separate instances of the FlinkKafkaConsumer for each cluster (that's what you are already doing), and then union the resulting streams

在我的情况下不起作用.

It's not working in my case.

所以我的问题是:

  1. 我错过了什么吗?
  2. 如果我们在Kafka集群中有一台新计算机,是否需要更改flink的代码来为新borker添加使用者?还是我们可以在运行时自动处理?

推荐答案

似乎您误解了Kafka分布式流的概念.

It seems you've misunderstood the concept of Kafka's distributed streams.

Kafka主题由几个分区组成(在您的情况下为3个).每个使用者可以使用这些分区中的一个或多个.如果您使用相同的 group.id 启动3个应用程序实例,则每个使用者确实只能从一个代理读取数据–它会尝试平均分配负载,因此每个使用者一个分区.

Kafka topic consists of several partitions (3 in your case). Each consumer can consume one or more of these partitions. If you start 3 instances of your app with the same group.id, each consumer will indeed read data from just one broker – it tries to distribute the load evenly so it's one partition per consumer.

我建议阅读有关此主题的更多信息,尤其是 Kafka文档.

I recommend to read more about this topic, especially about the concept of consumer groups in Kafka documentation.

任何一个FlinkKafkaConsumer09都可以在多个并行实例中运行,每个实例将从一个或多个Kafka分区中提取数据.您无需担心会创建更多使用者的实例.一个使用者实例可以从所有分区中提取记录.

Anyway FlinkKafkaConsumer09 can run in multiple parallel instances, each of which will pull data from one or more Kafka partitions. You don't need to worry about creating more instances of the consumer. One instance of consumer can pull records from all of the partitions.

我不知道为什么您要开始工作3次而不是将并行度设置为3一次.那可以解决您的问题.

I have no idea why you're starting the job 3 times instead of once with parallelism set to 3. That would solve your problem.

DataStream<T> stream =
      env.addSource(new FlinkKafkaConsumer09<>("myTopic", new SimpleStringSchema(), properties))
              .setParallelism(3);

这篇关于使用Flink从多个经纪人Kafka读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆