KeyError异常:SparkConf初始化过程中SPARK_HOME [英] KeyError: SPARK_HOME during SparkConf initialization

查看:4154
本文介绍了KeyError异常:SparkConf初始化过程中SPARK_HOME的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个新手的火花,我想运行在命令行的Python脚本。我已经测试了交互和pyspark它的工作原理。我在尝试创建SC时,这个错误:

 文件test.py,10号线,上述<&模块GT;
    conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
  文件/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py,线路104,在__init__
    SparkContext._ensure_initialized()
  文件/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py线229,在_ensure_initialized
    SparkContext._gateway =网关或launch_gateway()
  文件/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py48行,在launch_gateway
    SPARK_HOME = os.environ [SPARK_HOME]
  文件/usr/lib/python2.7/UserDict.py23行,在__getitem__
    引发KeyError异常(键)
KeyError异常:SPARK_HOME


解决方案

好像这里有两个问题。

第一个是你使用的路径。 SPARK_HOME 应指向星火安装的根目录因此,在你的情况下,也许应该是 /home/dirk/spark-1.4.1-bin -hadoop2.6 不是 /home/dirk/spark-1.4.1-bin-hadoop2.6/bin

第二个问题是一个方式,您如何使用 setSparkHome 。如果检查<一个href=\"https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/python/pyspark/conf.py#L130\">a文档字符串其目标是


  

在这里星火安装在工作节点设置路径


SparkConf 构造假定 SPARK_HOME 上主已经设置。 <一href=\"https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/python/pyspark/conf.py#L104\">It呼吁 pyspark.context.SparkContext._ensure_initialized which呼吁 pyspark.java_gateway.launch_gateway ,<一个href=\"https://github.com/apache/spark/blob/49351c7f597c67950cc65e5014a89fad31b9a6f7/python/pyspark/java_gateway.py#L48\">which试图acccess SPARK_HOME 和失败。

要解决这个你应该设置 SPARK_HOME 创建之前 SparkConf

 导入OS
os.environ [SPARK_HOME] =/home/dirk/spark-1.4.1-bin-hadoop2.6
CONF =(SparkConf()。setMaster(本地)。setAppName('A'))

I am a spark newbie and I want to run a Python script from the command line. I have tested pyspark interactively and it works. I get this error when trying to create the sc:

File "test.py", line 10, in <module>
    conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway
    SPARK_HOME = os.environ["SPARK_HOME"]
  File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'SPARK_HOME'

解决方案

It seems like there are two problems here.

The first one is a path you use. SPARK_HOME should point to the root directory of the Spark installation so in your case it should probably be /home/dirk/spark-1.4.1-bin-hadoop2.6 not /home/dirk/spark-1.4.1-bin-hadoop2.6/bin.

The second problem is a way how you use setSparkHome. If you check a docstring its goal is to

set path where Spark is installed on worker nodes

SparkConf constructor assumes that SPARK_HOME on master is already set. It calls pyspark.context.SparkContext._ensure_initialized which calls pyspark.java_gateway.launch_gateway, which tries to acccess SPARK_HOME and fails.

To deal with this you should set SPARK_HOME before you create SparkConf.

import os
os.environ["SPARK_HOME"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6"
conf = (SparkConf().setMaster('local').setAppName('a'))

这篇关于KeyError异常:SparkConf初始化过程中SPARK_HOME的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆