带有Confluent Cloud Kafka连接性问题的Spark结构流 [英] Spark Structural Streaming with Confluent Cloud Kafka connectivity issue
问题描述
我正在PySpark中编写一个Spark结构的流应用程序,以从Confluent Cloud中的Kafka读取数据.spark readstream()
函数的文档太浅,在可选参数部分(特别是在auth机制部分)没有指定太多.我不确定哪个参数出错并导致连接崩溃.任何有Spark经验的人都可以帮助我开始此连接吗?
I am writing a Spark structured streaming application in PySpark to read data from Kafka in Confluent Cloud. The documentation for the spark readstream()
function is too shallow and didn't specify much on the optional parameter part especially on the auth mechanism part. I am not sure what parameter goes wrong and crash the connectivity. Can anyone have experience in Spark help me to start this connection?
必需参数
> Consumer({'bootstrap.servers':
> 'cluster.gcp.confluent.cloud:9092',
> 'sasl.username':'xxx',
> 'sasl.password': 'xxx',
> 'sasl.mechanisms': 'PLAIN',
> 'security.protocol': 'SASL_SSL',
> 'group.id': 'python_example_group_1',
> 'auto.offset.reset': 'earliest' })
这是我的pyspark代码:
Here is my pyspark code:
df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "cluster.gcp.confluent.cloud:9092") \
.option("subscribe", "test-topic") \
.option("kafka.sasl.mechanisms", "PLAIN")\
.option("kafka.security.protocol", "SASL_SSL")\
.option("kafka.sasl.username","xxx")\
.option("kafka.sasl.password", "xxx")\
.option("startingOffsets", "latest")\
.option("kafka.group.id", "python_example_group_1")\
.load()
display(df)
但是,我不断收到错误消息:
However, I keep getting an error:
kafkashaded.org.apache.kafka.common.KafkaException:无法构建卡夫卡消费者
kafkashaded.org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
DataBrick笔记本-用于测试
文档