Spark Structured Streaming Kafka Integration Offset 管理 [英] Spark Structured Streaming Kafka Integration Offset management

查看：78 发布时间：2021/7/15 21:02:57 scala apache-spark spark-structured-streaming

本文介绍了Spark Structured Streaming Kafka Integration Offset 管理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

文档说:

enable.auto.commit:Kafka 源不提交任何偏移量.

enable.auto.commit: Kafka source doesn’t commit any offset.

因此我的问题是，如果工作程序或分区崩溃/重启:

Hence my question is, in the event of a worker or partition crash/restart :

这似乎很重要.有关如何处理的任何指示?

This is seems to be quite important. Any indication on how to deal with it ?

我也遇到了这个问题.

您对 2 个选项的观察是正确的，即

You're right in your observations on the 2 options i.e.

不过……

通过添加以下选项可以选择检查点:

There is the option of checkpointing by adding the following option:

.writeStream.<别的东西>.option("checkpointLocation", "path/to/HDFS/dir").<其他东西>

如果发生故障，Spark 将遍历此检查点目录的内容，在接受任何新数据之前恢复状态.

In the event of a failure, Spark would go through the contents of this checkpoint directory, recover the state before accepting any new data.

我在以下网站上找到了这个有用的参考一样.

I found this useful reference on the same.

希望这会有所帮助！

这篇关于Spark Structured Streaming Kafka Integration Offset 管理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文