KeyError异常:SparkConf初始化过程中SPARK_HOME [英] KeyError: SPARK_HOME during SparkConf initialization
问题描述
我是一个新手的火花,我想运行在命令行的Python脚本。我已经测试了交互和pyspark它的工作原理。我在尝试创建SC时,这个错误:
文件test.py,10号线,上述<&模块GT;
conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin'))
文件/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py,线路104,在__init__
SparkContext._ensure_initialized()
文件/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py线229,在_ensure_initialized
SparkContext._gateway =网关或launch_gateway()
文件/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py48行,在launch_gateway
SPARK_HOME = os.environ [SPARK_HOME]
文件/usr/lib/python2.7/UserDict.py23行,在__getitem__
引发KeyError异常(键)
KeyError异常:SPARK_HOME
好像这里有两个问题。
第一个是你使用的路径。 SPARK_HOME
应指向星火安装的根目录因此,在你的情况下,也许应该是 /home/dirk/spark-1.4.1-bin -hadoop2.6
不是 /home/dirk/spark-1.4.1-bin-hadoop2.6/bin
。
第二个问题是一个方式,您如何使用 setSparkHome
。如果检查<一个href=\"https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/python/pyspark/conf.py#L130\">a文档字符串其目标是
在这里星火安装在工作节点设置路径
块引用>
SparkConf
构造假定SPARK_HOME
上主已经设置。 <一href=\"https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/python/pyspark/conf.py#L104\">It呼吁pyspark.context.SparkContext._ensure_initialized
which呼吁pyspark.java_gateway.launch_gateway
,<一个href=\"https://github.com/apache/spark/blob/49351c7f597c67950cc65e5014a89fad31b9a6f7/python/pyspark/java_gateway.py#L48\">which试图acccessSPARK_HOME
和失败。要解决这个你应该设置
SPARK_HOME
创建之前SparkConf
。导入OS
os.environ [SPARK_HOME] =/home/dirk/spark-1.4.1-bin-hadoop2.6
CONF =(SparkConf()。setMaster(本地)。setAppName('A'))I am a spark newbie and I want to run a Python script from the command line. I have tested pyspark interactively and it works. I get this error when trying to create the sc:
File "test.py", line 10, in <module> conf=(SparkConf().setMaster('local').setAppName('a').setSparkHome('/home/dirk/spark-1.4.1-bin-hadoop2.6/bin')) File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/conf.py", line 104, in __init__ SparkContext._ensure_initialized() File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/context.py", line 229, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/java_gateway.py", line 48, in launch_gateway SPARK_HOME = os.environ["SPARK_HOME"] File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__ raise KeyError(key) KeyError: 'SPARK_HOME'
解决方案It seems like there are two problems here.
The first one is a path you use.
SPARK_HOME
should point to the root directory of the Spark installation so in your case it should probably be/home/dirk/spark-1.4.1-bin-hadoop2.6
not/home/dirk/spark-1.4.1-bin-hadoop2.6/bin
.The second problem is a way how you use
setSparkHome
. If you check a docstring its goal is toset path where Spark is installed on worker nodes
SparkConf
constructor assumes thatSPARK_HOME
on master is already set. It callspyspark.context.SparkContext._ensure_initialized
which callspyspark.java_gateway.launch_gateway
, which tries to acccessSPARK_HOME
and fails.To deal with this you should set
SPARK_HOME
before you createSparkConf
.import os os.environ["SPARK_HOME"] = "/home/dirk/spark-1.4.1-bin-hadoop2.6" conf = (SparkConf().setMaster('local').setAppName('a'))
这篇关于KeyError异常:SparkConf初始化过程中SPARK_HOME的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!