Spark 2.1结构化流-使用Kakfa作为Python的源(pyspark) [英] Spark 2.1 Structured Streaming - Using Kakfa as source with Python (pyspark)

查看:87
本文介绍了Spark 2.1结构化流-使用Kakfa作为Python的源(pyspark)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于Apache Spark 2.1版,我想将Kafka(0.10.0.2.5)用作pyspark的结构化流的来源:

With Apache Spark version 2.1, I would like to use Kafka (0.10.0.2.5) as source for Structured Streaming with pyspark:

kafka_app.py:

kafka_app.py:

from pyspark.sql import SparkSession

spark=SparkSession.builder.appName("TestKakfa").getOrCreate()

kafka=spark.readStream.format("kafka") \
.option("kafka.bootstrap.servers","localhost:6667") \
.option("subscribe","mytopic").load()

我通过以下方式启动了该应用程序:

I launched the app in the following way:

./bin/spark-submit kafka_app.py --master local[4] --jars spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar

从mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0下载.jar后

After having downloaded the .jar from mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10-assembly_2.10/2.1.0

我得到这样的错误:

[...] java.lang.ClassNotFoundException:Failed to find data source: kakfa. [...]

类似地,我无法运行与Kakfa集成的Spark示例:

Similarly, I cannot run the Spark example of integration with Kakfa : https://spark.apache.org/docs/2.1.0/structured-streaming-kafka-integration.html

所以我想知道我错在哪里或者实际上是否支持使用pyspark将Kafka与Spark 2.1集成,因为此页面仅提及Scala和Java作为版本0.10中受支持的语言,这使我产生疑问:

So I wonder where I am wrong or whether Kafka integration with Spark 2.1 using pyspark is actually supported as this page mentioning only Scala and Java as supported language in the version 0.10 makes me doubt : https://spark.apache.org/docs/latest/streaming-kafka-integration.html (But if not supported yet, why an example in Python was published ?)

在此先感谢您的帮助!

推荐答案

您需要使用sql结构的流jar"spark-sql-kafka-0-10_2.11-2.1.0.jar",而不是spark-streaming -kafka-0-10-assembly_2.10-2.1.0.jar.

You need to use sql-structured streaming jar "spark-sql-kafka-0-10_2.11-2.1.0.jar" instead of spark-streaming-kafka-0-10-assembly_2.10-2.1.0.jar.

这篇关于Spark 2.1结构化流-使用Kakfa作为Python的源(pyspark)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆