TypeError:"JavaPackage"对象不可调用&在类路径中找不到Spark Streaming的Kafka库 [英] TypeError: 'JavaPackage' object is not callable & Spark Streaming's Kafka libraries not found in class path
问题描述
我使用pyspark流式传输读取kafka数据,但出错了:
I use pyspark streaming to read kafka data, but it went wrong:
import os
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext
from pyspark import SparkContext
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.0.2 pyspark-shell'
sc = SparkContext(appName="test")
sc.setLogLevel("WARN")
ssc = StreamingContext(sc, 60)
kafkaStream = KafkaUtils.createStream(ssc, "localhost:2181", "test-id", {'test': 2})
kafkaStream.map(lambda x: x.split(" ")).pprint()
ssc.start()
ssc.awaitTermination()
________________________________________________________________________________________________
Spark Streaming's Kafka libraries not found in class path. Try one of the following.
1. Include the Kafka library and its dependencies with in the
spark-submit command as
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.4.3 ...
2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.4.3.
Then, include the jar in the spark-submit command as
$ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...
________________________________________________________________________________________________
Traceback (most recent call last):
File "/home/docs/dp_model/dp_algo_platform/dp_algo_core/test/test.py", line 29, in <module>
kafkaStream = KafkaUtils.createStream(ssc, "localhost:2181", "test-id", {'test': 2})
File "/home/softs/spark-2.4.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 78, in createStream
File "/home/softs/spark-2.4.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 217, in _get_helper
TypeError: 'JavaPackage' object is not callable
我的spark版本:2.4.3,kafka版本:2.1.0,我替换了 os.environ ['PYSPARK_SUBMIT_ARGS'] ='-打包org.apache.spark:spark-streaming-kafka-0-8:2.0.2 pyspark-shell'
与 os.environ ['PYSPARK_SUBMIT_ARGS'] ='-将org.apache.spark:spark-streaming-kafka-0-8:2.4.3打包pyspark-shell'
,它也无法正常工作.我该怎么办?
My spark version: 2.4.3, kafka version: 2.1.0, and I replace os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.0.2 pyspark-shell'
with os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.4.3 pyspark-shell'
, it cannot work either. How can I do it?
推荐答案
我认为您应该四处走动,以便在导入和初始化Spark变量之前先将变量加载到环境中
I think you should move around your imports such that the environment is loaded with the variable before you import and initialize the Spark variables
您肯定还需要使用与Spark版本相同的软件包
You also definitely need to be using the same version of packages as your Spark version
import os
sparkVersion = '2.4.3' # update this accordingly
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:{} pyspark-shell'.format(sparkVersion)
# import Spark core
from pyspark.sql import SparkSession
from pyspark.streaming import StreamingContext
# import extra packages
from pyspark.streaming.kafka import KafkaUtils
# begin application
spark = SparkSession.builder.appName("test").getOrCreate()
sc = spark.sparkContext
注意:自Spark 2.3.0起已不支持Kafka 0.8支持
这篇关于TypeError:"JavaPackage"对象不可调用&在类路径中找不到Spark Streaming的Kafka库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!