Pyspark:Spyder 中的 SparkContext 定义引发 Java 网关错误 [英] Pyspark: SparkContext definition in Spyder throws Java gateway error

查看:43
本文介绍了Pyspark:Spyder 中的 SparkContext 定义引发 Java 网关错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将 Spyder 与 pyspark (spark-2.1.1) 一起使用,但我无法修复一个相当令人沮丧的 Java 错误.激活 conda 环境(Python 版本为 3.5.3)后,我从 Windows 10 的命令行启动 spyder.这是我的代码:

I would like to use Spyder with pyspark (spark-2.1.1) but I cannot fix a rather frustrating Java error. I launch spyder from command line in Windows 10 after activating a conda environment (Python version is 3.5.3). This is my code:

import pyspark
sc = pyspark.SparkContext("local")
file = sc.textFile("C:/test.log")
words = file.flatMap(lambda line : line.split(" "))
words.count()

当我尝试定义 sc 时,出现以下错误:

When I try to define sc i get the following error:

  File "D:\spark-2.1.1-bin-hadoop2.7\python\pyspark\java_gateway.py", line 95, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")

Exception: Java gateway process exited before sending the driver its port number

为了完整起见:

  1. 如果我在激活 conda 环境后从命令行运行 pyspark,它可以正常工作并正确执行字数统计任务.

  1. if I run pyspark from the command line after activating the conda environment, it works and correctly performs the word count task.

如果我从 Windows 10 的开始"菜单启动 Spyder App Desktop,一切正常(但我认为在这种情况下我无法从 conda 环境加载正确的 Python 模块).

If I launch Spyder App Desktop from the Start Menu in Windows 10, everything works (but I think I cannot load the right python modules from my conda environment in this case).

相关的环境变量好像没问题:

The related environment variables seem to be ok:

回声 %SPARK_HOME%

echo %SPARK_HOME%

D:\spark-2.1.1-bin-hadoop2.7

D:\spark-2.1.1-bin-hadoop2.7

回声%JAVA_HOME%

echo %JAVA_HOME%

C:\Java\jdk1.8.0_121

C:\Java\jdk1.8.0_121

回显 %PYTHONPATH%

echo %PYTHONPATH%

D:\spark-2.1.1-bin-hadoop2.7\python;D:\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip;D:\spark-2.1.1-bin-hadoop2.7\python\lib;C:\用户\用户\Anaconda3

D:\spark-2.1.1-bin-hadoop2.7\python;D:\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip; D:\spark-2.1.1-bin-hadoop2.7\python\lib; C:\Users\user\Anaconda3

我已经尝试过提出的解决方案 这里,但对我没有任何作用.非常感谢任何建议!

I have already tried with the solutions proposed here, but nothing worked for me. Any suggestion is greatly appreciated!

推荐答案

由于 1) 正在工作,因此最好使用 Spyder 中的 conda 环境.

Since 1) is working, it is probably best to use the conda environment in Spyder.

在首选项中,转到Python 解释器"部分,然后从默认(即与 Spyder 的相同)"切换到使用以下 Python 解释器".

In Preferences go to the "Python Interpreter"section and switch from "Default (i.e. the same as Spyder's)" to "Use the following Python interpreter".

如果你的环境叫spark_env,Anaconda安装在C:\Program Files\Continnum\Anaconda下,这个环境对应的python配置文件是C:\Program Files\Continnum\Anaconda\envs\spark_env\python.exe.

If your environment is called spark_env and Anaconda is installed under C:\Program Files\Continnum\Anaconda, the python profile corresponding to this environment is C:\Program Files\Continnum\Anaconda\envs\spark_env\python.exe.

此更改后,Spyder startet 中的 python 控制台将位于您的 conda 环境中(请注意,这不适用于 IPyhton).

A python console in Spyder startet after this change will be in your conda environment (note that this does not apply to IPyhton).

要检查环境变量,您可以使用 python 代码来确保这些变量与您的脚本看到的变量相同:

To check environment variables, you can use python code to make sure these are the same variables your script sees:

   from os import environ
   print(environ['SPARK_HOME'])
   print(environ['JAVA_HOME'])
   try:
          print(environ['PYSPARK_SUBMIT_ARGS'])
   except:
      print("no problem with PYSPARK_SUBMIT_ARGS")  # https://github.com/ContinuumIO/anaconda-issues/issues/1276#issuecomment-277355043

希望有所帮助.

这篇关于Pyspark:Spyder 中的 SparkContext 定义引发 Java 网关错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆