带有 kafka 的 Spark 结构化流只导致一批(Pyspark) [英] Spark structured streaming with kafka leads to only one batch (Pyspark)

查看：31 发布时间：2021/11/12 2:53:11 apache-spark pyspark apache-kafka

本文介绍了带有 kafka 的 Spark 结构化流只导致一批(Pyspark)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下代码，我想知道为什么它只生成一批:

I have the following code and I'm wondering why it generates only one batch:

df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "IP").option("subscribe", "Topic").option("startingOffsets","earliest").load()
// groupby on slidings windows
query = slidingWindowsDF.writeStream.queryName("bla").outputMode("complete").format("memory").start()

使用以下参数启动应用程序:

The application is launched with the following parameters:

spark.streaming.backpressure.initialRate 5
spark.streaming.backpressure.enabled True

kafka 主题包含大约 1100 万条消息.由于 initialRate 参数，我期望它至少应该生成两个批次，但它只生成一个.谁能告诉我为什么 spark 只处理一批代码?

The kafka topic contains around 11 million messages. I'm expecting that it should at least generate two batches due to the initialRate parameter, but it generates only one. Can anyone tell why spark is processing my code in only one batch?

我使用的是 Spark 2.2.1 和 Kafka 1.0.

I'm using Spark 2.2.1 and Kafka 1.0.

带有 kafka 的 Spark 结构化流只导致一批(Pyspark) [英] Spark structured streaming with kafka leads to only one batch (Pyspark)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

带有 kafka 的 Spark 结构化流只导致一批(Pyspark) [英] Spark structured streaming with kafka leads to only one batch (Pyspark)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭