未找到密钥:_PYSPARK_DRIVER_CALLBACK_HOST [英] key not found: _PYSPARK_DRIVER_CALLBACK_HOST
问题描述
我正在尝试运行此代码:
I'm trying to run this code:
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder
.master("local")
.appName("Word Count")
.getOrCreate()
df = spark.createDataFrame([
(1, 144.5, 5.9, 33, 'M'),
(2, 167.2, 5.4, 45, 'M'),
(3, 124.1, 5.2, 23, 'F'),
(4, 144.5, 5.9, 33, 'M'),
(5, 133.2, 5.7, 54, 'F'),
(3, 124.1, 5.2, 23, 'F'),
(5, 129.2, 5.3, 42, 'M'),
], ['id', 'weight', 'height', 'age', 'gender'])
df.show()
print('Count of Rows: {0}'.format(df.count()))
print('Count of distinct Rows: {0}'.format((df.distinct().count())))
spark.stop()
然后出现错误
18/06/22 11:58:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST
...
Exception: Java gateway process exited before sending its port number
我使用的是 PyCharm 和 MacOS、Python 3.6、Spark 2.3.1
I'm using PyCharm and MacOS, Python 3.6, Spark 2.3.1
这个错误的可能原因是什么?
What is the possible reason of this error?
推荐答案
此错误是由于版本不匹配造成的.回溯中引用的环境变量 (_PYSPARK_DRIVER_CALLBACK_HOST
) 已在 期间删除Py4j 依赖于 0.10.7 并在 2.3.1 中向后移植到 2.3 分支.
This error is a result of a version mismatch. Environment variable which is referenced in the traceback (_PYSPARK_DRIVER_CALLBACK_HOST
) has been removed during update Py4j dependency to 0.10.7 and backported to 2.3 branch in 2.3.1.
考虑版本信息:
我使用的是 PyCharm 和 MacOS、Python 3.6、Spark 2.3.1
I'm using PyCharm and MacOS, Python 3.6, Spark 2.3.1
您似乎安装了 2.3.1 软件包,但 SPARK_HOME
指向较旧的(2.3.0 或更早版本)安装.
it looks like you have 2.3.1 package installed, but SPARK_HOME
points to an older (2.3.0 or earlier) installation.
这篇关于未找到密钥:_PYSPARK_DRIVER_CALLBACK_HOST的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!