TypeError: 'JavaPackage' 对象不可调用 &在类路径中找不到 Spark Streaming 的 Kafka 库 [英] TypeError: 'JavaPackage' object is not callable & Spark Streaming's Kafka libraries not found in class path

查看:16
本文介绍了TypeError: 'JavaPackage' 对象不可调用 &在类路径中找不到 Spark Streaming 的 Kafka 库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用pyspark流来读取kafka数据,但是出错了:

I use pyspark streaming to read kafka data, but it went wrong:

import os
from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext
from pyspark import SparkContext
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.0.2 pyspark-shell'
sc = SparkContext(appName="test")
sc.setLogLevel("WARN")
ssc = StreamingContext(sc, 60)
kafkaStream = KafkaUtils.createStream(ssc, "localhost:2181", "test-id", {'test': 2})
kafkaStream.map(lambda x: x.split(" ")).pprint()

ssc.start()
ssc.awaitTermination()

________________________________________________________________________________________________

Spark Streaming's Kafka libraries not found in class path. Try one of the following.

1. Include the Kafka library and its dependencies with in the
 spark-submit command as

 $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.4.3 ...

2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
 Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.4.3.
 Then, include the jar in the spark-submit command as

 $ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...

________________________________________________________________________________________________


Traceback (most recent call last):
File "/home/docs/dp_model/dp_algo_platform/dp_algo_core/test/test.py", line 29, in <module>
kafkaStream = KafkaUtils.createStream(ssc, "localhost:2181", "test-id", {'test': 2})
File "/home/softs/spark-2.4.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 78, in createStream
File "/home/softs/spark-2.4.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 217, in _get_helper
TypeError: 'JavaPackage' object is not callable

我的 spark 版本:2.4.3,kafka 版本:2.1.0,我替换了 os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.0.2 pyspark-shell' with os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.4.3pyspark-shell',它也不能工作.我该怎么做?

My spark version: 2.4.3, kafka version: 2.1.0, and I replace os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.0.2 pyspark-shell' with os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.4.3 pyspark-shell', it cannot work either. How can I do it?

推荐答案

我认为你应该移动你的导入,以便在你导入和初始化 Spark 变量之前环境已经加载了变量

I think you should move around your imports such that the environment is loaded with the variable before you import and initialize the Spark variables

您也肯定需要使用与 Spark 版本相同版本的软件包

You also definitely need to be using the same version of packages as your Spark version

import os
sparkVersion = '2.4.3'  # update this accordingly 
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:{} pyspark-shell'.format(sparkVersion) 

# import Spark core 
from pyspark.sql import SparkSession 
from pyspark.streaming import StreamingContext
# import extra packages 
from pyspark.streaming.kafka import KafkaUtils


# begin application 
spark = SparkSession.builder.appName("test").getOrCreate() 
sc = spark.sparkContext

注意:Kafka 0.8 支持从 Spark 2.3.0 开始被弃用

这篇关于TypeError: 'JavaPackage' 对象不可调用 &amp;在类路径中找不到 Spark Streaming 的 Kafka 库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆