Pyspark:异常:Java 网关进程在向驱动程序发送其端口号之前退出 [英] Pyspark: Exception: Java gateway process exited before sending the driver its port number

查看:35
本文介绍了Pyspark:异常:Java 网关进程在向驱动程序发送其端口号之前退出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在我的 macbook air 上运行 pyspark.当我尝试启动它时出现错误:

I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:

Exception: Java gateway process exited before sending the driver its port number

在启动时调用 sc = SparkContext() 时.我试过运行以下命令:

when sc = SparkContext() is being called upon startup. I have tried running the following commands:

./bin/pyspark
./bin/spark-shell
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

没有用.我也看过这里:

with no avail. I have also looked here:

Spark + Python - Java网关进程在向驱动程序发送端口号之前退出了吗?

但是这个问题一直没有得到解答.请帮忙!谢谢.

but the question has never been answered. Please help! Thanks.

推荐答案

在 Ubuntu 上运行 pyspark 时出现此错误消息,通过安装 openjdk-8-jdk 包摆脱了它

Had this error message running pyspark on Ubuntu, got rid of it by installing the openjdk-8-jdk package

from pyspark import SparkConf, SparkContext
sc = SparkContext(conf=SparkConf().setAppName("MyApp").setMaster("local"))
^^^ error

安装 Open JDK 8:

Install Open JDK 8:

apt-get install openjdk-8-jdk-headless -qq    

在 MacOS 上

在 Mac OS 上也是如此,我在终端中输入:

On MacOS

Same on Mac OS, I typed in a terminal:

$ java -version
No Java runtime present, requesting install. 

系统提示我从 Oracle 的下载站点,选择MacOS安装程序,点击jdk-13.0.2_osx-x64_bin.dmg,然后检查Java是否已安装

I was prompted to install Java from the Oracle's download site, chose the MacOS installer, clicked on jdk-13.0.2_osx-x64_bin.dmg and after that checked that Java was installed

$ java -version
java version "13.0.2" 2020-01-14

编辑 要安装 JDK 8,您需要转到 https://www.oracle.com/java/technologies/javase-jdk8-downloads.html(需要登录)

EDIT To install JDK 8 you need to go to https://www.oracle.com/java/technologies/javase-jdk8-downloads.html (login required)

在那之后,我能够使用 pyspark 启动 Spark 上下文.

After that I was able to start a Spark context with pyspark.

在 Python 中:

In Python:

from pyspark import SparkContext 
sc = SparkContext.getOrCreate() 

# check that it really works by running a job
# example from http://spark.apache.org/docs/latest/rdd-programming-guide.html#parallelized-collections
data = range(10000) 
distData = sc.parallelize(data)
distData.filter(lambda x: not x&1).take(10)
# Out: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

请注意,您可能需要设置环境变量 PYSPARK_PYTHONPYSPARK_DRIVER_PYTHON,并且它们必须与您使用的 Python(或 IPython)的 Python 版本相同运行 pyspark(驱动程序).

Note that you might need to set the environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON and they have to be the same Python version as the Python (or IPython) you're using to run pyspark (the driver).

这篇关于Pyspark:异常:Java 网关进程在向驱动程序发送其端口号之前退出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆