如何在Spark结构化流中手动设置group.id并提交kafka偏移量? [英] How to manually set group.id and commit kafka offsets in spark structured streaming?

查看:719
本文介绍了如何在Spark结构化流中手动设置group.id并提交kafka偏移量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读Spark结构化流-Kafka集成指南

I was going through the Spark structured streaming - Kafka integration guide here.

此链接告知

enable.auto.commit:Kafka源代码没有任何偏移量.

enable.auto.commit: Kafka source doesn’t commit any offset.

那么一旦我的spark应用程序成功处理了每条记录,我该如何手动提交偏移量?

So how do I manually commit offsets once my spark application has successfully processed each record?

推荐答案

现状(火花2.4.5)

此功能似乎正在Spark社区中进行讨论 https://github.com/apache/spark/pull/24613 .

Current Situation (Spark 2.4.5)

This feature seems to be under discussion in the Spark community https://github.com/apache/spark/pull/24613.

在该请求请求中,您还可以在 https:上找到一个可能的解决方案://github.com/HeartSaVioR/spark-sql-kafka-offset-committer .

In that Pull Request you will also find a possible solution for this at https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer.

目前,Spark结构化流+ Kafka集成文档清楚说明了它如何管理Kafka偏移量.偏移量最重要的Kafka特定配置是:

At the moment, the Spark Structured Streaming + Kafka integration documentation clearly states how it manages Kafka offsets. The most important Kafka specific configurations for offsets are:

  • group.id::Kafka源将自动为每个查询创建一个唯一的组ID.根据
  • group.id: Kafka source will create a unique group id for each query automatically. According to the code the group.id will be set to
val uniqueGroupId = s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆