PySpark例外:Java网关进程在发送其端口号之前已退出 [英] PySpark exception: Java gateway process exited before sending its port number

查看:80
本文介绍了PySpark例外:Java网关进程在发送其端口号之前已退出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我运行Windows 10,并通过Anaconda3安装了Python3.我正在使用Jupyter Notebook.我已经从这里

我不知道这是否相关,但是在我的环境变量中,出现以下两个变量:

完成所有这些操作后,我重新启动,然后运行以下代码,这会导致错误消息,并将其粘贴到此处:

 将pandas导入为pd将seaborn导入为sns#这些行允许运行spark命令从pyspark.context导入SparkContext从pyspark.sql.session导入SparkSessionsc = SparkContext('本地')spark = SparkSession(sc)导入pyspark数据= sns.load_dataset('iris')data_sp = spark.createDataFrame(数据)data_sp.show()---------------------------------------------------------------------------异常回溯(最近一次通话)< ipython-input-1-ec964ecd39a2>在< module>()中7从pyspark.context导入SparkContext8从pyspark.sql.session导入SparkSession---->9 sc = SparkContext('本地')10火花= SparkSession(sc)11__init__中的C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pyspark \ context.py113"114 self._callsite = first_spark_call()或CallSite(无,无,无)->115 SparkContext._ensure_initialized(自己,网关=网关,conf = conf)116尝试:117 self._do_init(master,appName,sparkHome,pyFiles,环境,batchSize,序列化器,_ensure_initialized(cls,实例,网关,conf)中的C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pyspark \ context.py带有SparkContext._lock的296:297(如果不是SparkContext._gateway):->298 ========================================================299 ================================================================================300C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ pyspark \ java_gateway.py在launch_gateway(conf)中9293如果不是os.path.isfile(conn_info_file):--->94引发异常(在发送其端口号之前退出Java网关进程")9596,其中open(conn_info_file,"rb")作为信息:例外:Java网关进程在发送其端口号之前已退出 

如何使PySpark工作?

解决方案

我已按照以下说明解决了该问题:from here (spark-2.3.0-bin-hadoop2.7.tgz). I have extracted the files and pasted them in my directory D:\Spark. I have amended the Environment Variables:

User variable:

Variable: SPARK_HOME

Value: D:\Spark

System variable:

Variable: PATH

Value: D:\Spark\bin

I have installed/updated via conda the following modules:

pandas

numpy

pyarrow

pyspark

py4j

Java is installed:

I don't know if this is relevant but in my Environment Variables the following two variables appear:

Having done all these I rebooted and I run the following piece of code which results in an error message which I paste here:

import pandas as pd

import seaborn as sns

# These lines enable the run of spark commands

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)

import pyspark

data = sns.load_dataset('iris')

data_sp = spark.createDataFrame(data)

data_sp.show()

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-ec964ecd39a2> in <module>()
      7 from pyspark.context import SparkContext
      8 from pyspark.sql.session import SparkSession
----> 9 sc = SparkContext('local')
     10 spark = SparkSession(sc)
     11 

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    113         """
    114         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 115         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    116         try:
    117             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
    296         with SparkContext._lock:
    297             if not SparkContext._gateway:
--> 298                 SparkContext._gateway = gateway or launch_gateway(conf)
    299                 SparkContext._jvm = SparkContext._gateway.jvm
    300 

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf)
     92 
     93             if not os.path.isfile(conn_info_file):
---> 94                 raise Exception("Java gateway process exited before sending its port number")
     95 
     96             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

How can I make PySpark work?

解决方案

I resolved the problem following the instructions to be found here: https://changhsinlee.com/install-pyspark-windows-jupyter/

这篇关于PySpark例外:Java网关进程在发送其端口号之前已退出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆