Pyspark Structured Streaming Kafka 配置错误 [英] Pyspark Structured Streaming Kafka configuration error

查看：24 发布时间：2021/11/14 22:52:08 apache-spark pyspark apache-kafka apache-spark-sql spark-structured-streaming

本文介绍了Pyspark Structured Streaming Kafka 配置错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我之前成功地将 pyspark 用于 Spark Streaming (Spark 2.0.2) 和 Kafka (0.10.1.0)，但我的目的更适合结构化流.我尝试在线使用示例:https://spark.apache.org/docs/2.1.0/structured-streaming-kafka-integration.html

I've been using pyspark for Spark Streaming (Spark 2.0.2) with Kafka (0.10.1.0) successfully before, but my purposes are better suited for Structured Streaming. I've attempted to use the example online: https://spark.apache.org/docs/2.1.0/structured-streaming-kafka-integration.html

使用以下类似代码:

ds1 = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("subscribe", "topic1")
  .load()
query = ds1
  .writeStream
  .outputMode('append')
  .format('console')
  .start()
query.awaitTermination()

但是，我总是以以下错误告终:

However, I always end up with the following error:

: org.apache.kafka.common.config.ConfigException: 
Missing required configuration "partition.assignment.strategy" which has no default value

我还尝试在创建 ds1 时将其添加到我的选项集中:

I also tried adding in this to my set of options when creating ds1:

.option("partition.assignment.strategy", "range")

但即使明确地为其分配一个值也不能阻止错误，我可以在网上或 Kafka 文档中找到的任何其他值(如roundrobin")也没有.

But even explicitly assigning it a value didn't stop the error, nor did any other value (like "roundrobin") that I could find online or in the Kafka documentation.

我也用assign"选项尝试了这个并实现了同样的错误(我们的Kafka主机设置为assign——每个消费者只分配一个分区，我们没有任何重新平衡).

I also tried this with the "assign" option and achieved the same error (our Kafka host is set up for assign--each consumer is assigned only one partition, and we don't have any rebalancing).

知道这里发生了什么吗?该文档没有帮助(可能是因为它仍处于实验阶段).另外，是否有使用 KafkaUtils 进行结构化流处理?或者这是唯一的网关?

Any idea what's going on here? The documentation isn't helpful (probably since it's still in experimental phase). Also, is there anyway to do Structured Streaming using KafkaUtils? Or is this the only gateway?

Pyspark Structured Streaming Kafka 配置错误 [英] Pyspark Structured Streaming Kafka configuration error

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Pyspark Structured Streaming Kafka 配置错误 [英] Pyspark Structured Streaming Kafka configuration error

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭