Spark 2.1 Structured Streaming - 使用 Kakfa 作为 Python 源代码 (pyspark) [英] Spark 2.1 Structured Streaming - Using Kakfa as source with Python (pyspark)

查看:29
本文介绍了Spark 2.1 Structured Streaming - 使用 Kakfa 作为 Python 源代码 (pyspark)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Apache Spark 2.1 版,我想使用 Kafka (0.10.0.2.5) 作为带有 pyspark 的结构化流的源:

With Apache Spark version 2.1, I would like to use Kafka (0.10.0.2.5) as source for Structured Streaming with pyspark:

kafka_app.py:

kafka_app.py:

from pyspark.sql import SparkSession

spark=SparkSession.builder.appName("TestKakfa").getOrCreate()

kafka=spark.readStream.format("kafka") \
.option("kafka.bootstrap.servers","localhost:6667") \
.option("subscribe","mytopic").load()

我通过以下方式启动了应用程序:

I launched the app in the following way:

./bin/spark-submit kafka_app.py --master local[4] --jars spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar

从 mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0 下载 .jar 后

After having downloaded the .jar from mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0

我得到这样的错误:

[...] java.lang.ClassNotFoundException:Failed to find data source: kakfa. [...]

同样,我无法运行与 Kakfa 集成的 Spark 示例:https://spark.apache.org/docs/2.1.0/structured-streaming-kafka-integration.html

Similarly, I cannot run the Spark example of integration with Kakfa : https://spark.apache.org/docs/2.1.0/structured-streaming-kafka-integration.html

所以我想知道我哪里错了,或者实际上是否支持使用 pyspark 将 Kafka 与 Spark 2.1 集成,因为此页面仅提及 Scala 和 Java 作为 0.10 版本中支持的语言让我怀疑:https://spark.apache.org/docs/latest/streaming-kafka-integration.html(但如果还不支持,为什么要发布 Python 中的示例?)

So I wonder where I am wrong or whether Kafka integration with Spark 2.1 using pyspark is actually supported as this page mentioning only Scala and Java as supported language in the version 0.10 makes me doubt : https://spark.apache.org/docs/latest/streaming-kafka-integration.html (But if not supported yet, why an example in Python was published ?)

预先感谢您的帮助!

推荐答案

您需要使用 sql-structured Streaming jar "spark-sql-kafka-0-10_2.11-2.1.0.jar" 而不是 spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar.

You need to use sql-structured streaming jar "spark-sql-kafka-0-10_2.11-2.1.0.jar" instead of spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar.

这篇关于Spark 2.1 Structured Streaming - 使用 Kakfa 作为 Python 源代码 (pyspark)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆