FLINK:如何使用相同的 StreamExecutionEnvironment 从多个 kafka 集群中读取数据 [英] FLINK: How to read from multiple kafka cluster using same StreamExecutionEnvironment

查看:75
本文介绍了FLINK:如何使用相同的 StreamExecutionEnvironment 从多个 kafka 集群中读取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 FLINK 中读取多个 KAFKA 集群的数据.

I want to read data from multiple KAFKA clusters in FLINK.

但结果是 kafkaMessageStream 只从第一个 Kafka 读取.

But the result is that the kafkaMessageStream is reading only from first Kafka.

只有当我为两个 Kafka 分别拥有 2 个流 时,我才能从两个 Kafka 集群中读取数据,这不是我想要的.

I am able to read from both Kafka clusters only if i have 2 streams separately for both Kafka , which is not what i want.

是否可以将多个源附加到单个阅读器.

Is it possible to have multiple sources attached to single reader.

示例代码

public class KafkaReader<T> implements Reader<T>{

private StreamExecutionEnvironment executionEnvironment ;

public StreamExecutionEnvironment getExecutionEnvironment(Properties properties){
    executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
    executionEnvironment.setRestartStrategy( RestartStrategies.fixedDelayRestart(3, 1500));

    executionEnvironment.enableCheckpointing(
            Integer.parseInt(properties.getProperty(Constants.SSE_CHECKPOINT_INTERVAL,"5000")), CheckpointingMode.EXACTLY_ONCE);
    executionEnvironment.getCheckpointConfig().setCheckpointTimeout(60000);
    //executionEnvironment.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
    //try {
    //  executionEnvironment.setStateBackend(new FsStateBackend(new Path(Constants.SSE_CHECKPOINT_PATH)));
        // The RocksDBStateBackend or The FsStateBackend
    //} catch (IOException e) {
        // LOGGER.error("Exception during initialization of stateBackend in execution environment"+e.getMessage());
    }

    return executionEnvironment;
}
public DataStream<T> readFromMultiKafka(Properties properties_k1, Properties properties_k2 ,DeserializationSchema<T> deserializationSchema) {


    DataStream<T> kafkaMessageStream = executionEnvironment.addSource(new FlinkKafkaConsumer08<T>( 
            properties_k1.getProperty(Constants.TOPIC),deserializationSchema, 
            properties_k1));
    executionEnvironment.addSource(new FlinkKafkaConsumer08<T>( 
            properties_k2.getProperty(Constants.TOPIC),deserializationSchema, 
            properties_k2));

    return kafkaMessageStream;
}


public DataStream<T> readFromKafka(Properties properties,DeserializationSchema<T> deserializationSchema) {


    DataStream<T> kafkaMessageStream = executionEnvironment.addSource(new FlinkKafkaConsumer08<T>( 
            properties.getProperty(Constants.TOPIC),deserializationSchema, 
            properties));

    return kafkaMessageStream;
}

}

我的电话:

 public static void main( String[] args ) throws Exception
{
    Properties pk1 = new Properties();
    pk1.setProperty(Constants.TOPIC, "flink_test");
    pk1.setProperty("zookeeper.connect", "localhost:2181");
    pk1.setProperty("group.id", "1");
    pk1.setProperty("bootstrap.servers", "localhost:9092");
    Properties pk2 = new Properties();
    pk2.setProperty(Constants.TOPIC, "flink_test");
    pk2.setProperty("zookeeper.connect", "localhost:2182");
    pk2.setProperty("group.id", "1");
    pk2.setProperty("bootstrap.servers", "localhost:9093");


    Reader<String> reader = new KafkaReader<String>();
    //Do not work

    StreamExecutionEnvironment environment = reader.getExecutionEnvironment(pk1);
    DataStream<String> dataStream1 = reader.readFromMultiKafka(pk1,pk2,new SimpleStringSchema());
    DataStream<ImpressionObject> transform = new TsvTransformer().transform(dataStream);

    transform.print();      


  //Works:

    StreamExecutionEnvironment environment = reader.getExecutionEnvironment(pk1);
    DataStream<String> dataStream1 = reader.readFromKafka(pk1, new SimpleStringSchema());
    DataStream<String> dataStream2 = reader.readFromKafka(pk2, new SimpleStringSchema());

    DataStream<Tuple2<String, Integer>> transform1 = dataStream1.flatMap(new LineSplitter()).keyBy(0)
    .timeWindow(Time.seconds(5)).sum(1).setParallelism(5);
    DataStream<Tuple2<String, Integer>> transform2 = dataStream2.flatMap(new LineSplitter()).keyBy(0)
            .timeWindow(Time.seconds(5)).sum(1).setParallelism(5);


    transform1.print();     
    transform2.print();     

    environment.execute("Kafka Reader");
}

推荐答案

为了解决这个问题,我建议您为每个集群创建单独的 FlinkKafkaConsumer 实例(这就是您已经在做的),然后将结果合并流:

To resolve the issue, I would recommend you to create separate instances of the FlinkKafkaConsumer for each cluster (that's what you are already doing), and then union the resulting streams:

StreamExecutionEnvironment environment = reader.getExecutionEnvironment(pk1);
DataStream<String> dataStream1 = reader.readFromKafka(pk1, new SimpleStringSchema());
DataStream<String> dataStream2 = reader.readFromKafka(pk2, new SimpleStringSchema());
DataStream<String> finalStream = dataStream1.union(dataStream2);

这篇关于FLINK:如何使用相同的 StreamExecutionEnvironment 从多个 kafka 集群中读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆