如何在Spark Kafka直接流中手动提交偏移量? [英] How to manually commit offset in Spark Kafka direct streaming?

查看：577 发布时间：2020/9/4 1:31:41 apache-spark apache-kafka

本文介绍了如何在Spark Kafka直接流中手动提交偏移量?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我努力地环顾四周，但没有找到满意的答案.也许我想念一些东西.请帮忙.

I looked around hard but didn't find a satisfactory answer to this. Maybe I'm missing something. Please help.

我们有一个使用Kafka主题的Spark流媒体应用程序，该应用程序需要在推进Kafka偏移量之前确保端到端处理，例如更新数据库.这很像在流系统中建立事务支持，并确保每个消息都得到处理(转换)，更重要的是输出.

We have a Spark streaming application consuming a Kafka topic, which needs to ensure end-to-end processing before advancing Kafka offsets, e.g. updating a database. This is much like building transaction support within the streaming system, and guaranteeing that each message is processed (transformed) and, more importantly, output.

我已阅读有关Kafka DirectStreams的信息.它说，要在DirectStreaming模式下进行可靠的故障恢复，应启用Spark检查点，该检查点]).它没有提到我们如何(或是否)可以自定义提交偏移量(例如，一旦我们加载了数据库).换句话说，我们可以将"auto.commit.enable"设置为false并自己管理偏移量(与DB连接不同)吗?

I have read about Kafka DirectStreams. It says that for robust failure-recovery in DirectStreaming mode, Spark checkpointing should be enabled, which stores the offsets along with the checkpoints. But the offset management is done internally (setting Kafka config params like ["auto.offset.reset", "auto.commit.enable", "auto.offset.interval.ms"]). It does not speak of how (or if) we can customize committing offsets (once we've loaded a database, for e.g.). In other words, can we set "auto.commit.enable" to false and manage the offsets (not unlike a DB connection) ourselves?

任何指导/帮助都将不胜感激.

Any guidance/help is greatly appreciated.

如何在Spark Kafka直接流中手动提交偏移量? [英] How to manually commit offset in Spark Kafka direct streaming?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Spark Kafka直接流中手动提交偏移量? [英] How to manually commit offset in Spark Kafka direct streaming?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭