Pyspark:异常:Java 网关进程在向驱动程序发送其端口号之前退出 [英] Pyspark: Exception: Java gateway process exited before sending the driver its port number
问题描述
我正在尝试在我的 macbook air 上运行 pyspark.当我尝试启动它时出现错误:
I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:
Exception: Java gateway process exited before sending the driver its port number
在启动时调用 sc = SparkContext() 时.我试过运行以下命令:
when sc = SparkContext() is being called upon startup. I have tried running the following commands:
./bin/pyspark
./bin/spark-shell
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
没有用.我也看过这里:
with no avail. I have also looked here:
Spark + Python - Java网关进程在向驱动程序发送端口号之前退出了吗?
但是这个问题一直没有得到解答.请帮忙!谢谢.
but the question has never been answered. Please help! Thanks.
推荐答案
在 Ubuntu 上运行 pyspark 时出现此错误消息,通过安装 openjdk-8-jdk
包摆脱了它
Had this error message running pyspark on Ubuntu, got rid of it by installing the openjdk-8-jdk
package
from pyspark import SparkConf, SparkContext
sc = SparkContext(conf=SparkConf().setAppName("MyApp").setMaster("local"))
^^^ error
安装 Open JDK 8:
Install Open JDK 8:
apt-get install openjdk-8-jdk-headless -qq
在 MacOS 上
在 Mac OS 上也是如此,我在终端中输入:
On MacOS
Same on Mac OS, I typed in a terminal:
$ java -version
No Java runtime present, requesting install.
系统提示我从 Oracle 的下载站点,选择MacOS安装程序,点击jdk-13.0.2_osx-x64_bin.dmg
,然后检查Java是否已安装
I was prompted to install Java from the Oracle's download site, chose the MacOS installer, clicked on jdk-13.0.2_osx-x64_bin.dmg
and after that checked that Java was installed
$ java -version
java version "13.0.2" 2020-01-14
编辑 要安装 JDK 8,您需要转到 https://www.oracle.com/java/technologies/javase-jdk8-downloads.html(需要登录)
EDIT To install JDK 8 you need to go to https://www.oracle.com/java/technologies/javase-jdk8-downloads.html (login required)
在那之后,我能够使用 pyspark 启动 Spark 上下文.
After that I was able to start a Spark context with pyspark.
在 Python 中:
In Python:
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
# check that it really works by running a job
# example from http://spark.apache.org/docs/latest/rdd-programming-guide.html#parallelized-collections
data = range(10000)
distData = sc.parallelize(data)
distData.filter(lambda x: not x&1).take(10)
# Out: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
请注意,您可能需要设置环境变量 PYSPARK_PYTHON
和 PYSPARK_DRIVER_PYTHON
,并且它们必须与您使用的 Python(或 IPython)的 Python 版本相同运行 pyspark(驱动程序).
Note that you might need to set the environment variables PYSPARK_PYTHON
and PYSPARK_DRIVER_PYTHON
and they have to be the same Python version as the Python (or IPython) you're using to run pyspark (the driver).
这篇关于Pyspark:异常:Java 网关进程在向驱动程序发送其端口号之前退出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!