无法导入SparkContext [英] Unable to import SparkContext
本文介绍了无法导入SparkContext的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用CentOS,已经设置了 $ SPARK_HOME
,并且还在 $ PATH
中添加了 bin
的路径.
I'm working on CentOS, I've setup $SPARK_HOME
and also added path to bin
in $PATH
.
我可以在任何地方运行 pyspark
.
I can run pyspark
from anywhere.
但是当我尝试创建 python
文件并使用此语句时;
But when I try to create python
file and uses this statement;
from pyspark import SparkConf, SparkContext
它引发以下错误
python pysparktask.py
Traceback (most recent call last):
File "pysparktask.py", line 1, in <module>
from pyspark import SparkConf, SparkContext
ModuleNotFoundError: No module named 'pyspark'
我尝试使用 pip
重新安装它.
I tried to install it again using pip
.
pip install pyspark
它也给出了这个错误.
找不到满足pyspark要求的版本(来自版本:)找不到与pyspark匹配的分布
Could not find a version that satisfies the requirement pyspark (from versions: ) No matching distribution found for pyspark
编辑
根据答案,我更新了代码.
based on answer, I updated the code.
错误是
Traceback (most recent call last):
File "pysparktask.py", line 6, in <module>
from pyspark import SparkConf, SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
from pyspark.context import SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
from pyspark.java_gateway import launch_gateway
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'
推荐答案
添加以下环境变量,并将spark的lib路径附加到sys.path
Add the following environment variable and also append spark's lib path to sys.path
import os
import sys
os.environ['SPARK_HOME'] = "/usr/lib/spark/"
sys.path.append("/usr/lib/spark/python/")
from pyspark import SparkConf, SparkContext # And then try to import SparkContext.
这篇关于无法导入SparkContext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文