Spring Batch - Kafka:KafkaItemReader 从头开始​​读取数据 [英] Spring Batch - Kafka: KafkaItemReader reads the data ALWAYS from beginning

查看:57
本文介绍了Spring Batch - Kafka:KafkaItemReader 从头开始​​读取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我愿意使用 Spring Batch 进行 Kafka 数据消费.这个 spring-tips链接有一个基本示例.

I'm willing to use Spring Batch for Kafka data consumption. This spring-tips link has a basic example for the same.

这是我的阅读器:

  @Bean
  KafkaItemReader<String, String> kafkaItemReader() {
    var props = new Properties();
    props.putAll(this.properties.buildConsumerProperties());

    return new KafkaItemReaderBuilder<String, String>()
        .partitions(0)
        .consumerProperties(props)
        .name("customers-reader")
        .saveState(true)
        .topic("test-consumer")
        .build();
  }

我的application.properties 文件:

 spring:
    kafka:
      consumer:
        bootstrap-servers: localhost:9092
        group-id: groupid-Dev
        enable-auto-commit: false
        auto-offset-reset: latest
        auto.commit.interval.ms: 1000
        key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
        value-deserializer: org.apache.kafka.common.serialization.StringDeserialize

问题:

  • 每次我启动工作时,它都会寻找第 0 个偏移量.所以,我从一开始就收到消息.这是一个错误吗?
  • 为什么我们需要手动提供要读取的分区?以后有什么变化,会不会影响我的代码?

推荐答案

每次我启动工作时,它都会寻找第 0 个偏移量.所以,我从一开始就收到消息.这是一个错误吗?

Every time when I launch a job, it seeks 0th Offset. So, I am getting messages from beginning. Is this a bug?

不,这是一个特性(认真的):-) 选择让 kafka item reader 从 partition 的开头读取是为了使其与其他 reader 一致(它们都从数据源的开头开始).但是在偏移量是一阶概念的 Kafka 世界中,我们将使起始偏移量可配置(我们有一个 PR 对此).这将在即将发布的 v4.3 计划于 2020 年 10 月.

No, this is a feature (seriously) :-) The choice of making the kafka item reader reads from the beginning of the partition is to make it consistent with other readers (they all start from the beginning of the datasource). But in the world of Kafka where the offset is a first order concept, we will make the starting offset configurable (we have a PR for this). This will be shipped in the upcoming v4.3 planned for October 2020.

为什么我们需要手动提供要读取的分区?

Why do we need to manually supply partitions to read from?

因为 Spring Batch 无法决定从哪个分区读取给定的主题名称.我们愿意在此处提供有关合理默认值的建议.

Because Spring Batch cannot make the decision of what partition to read from for a given topic name. We are open for suggestions about a reasonable default here.

这篇关于Spring Batch - Kafka:KafkaItemReader 从头开始​​读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆