无法导入SparkContext [英] Unable to import SparkContext

查看:82
本文介绍了无法导入SparkContext的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CentOS,已经设置了 $ SPARK_HOME ,并且还在 $ PATH 中添加了 bin 的路径.

I'm working on CentOS, I've setup $SPARK_HOME and also added path to bin in $PATH.

我可以在任何地方运行 pyspark .

I can run pyspark from anywhere.

但是当我尝试创建 python 文件并使用此语句时;

But when I try to create python file and uses this statement;

from pyspark import SparkConf, SparkContext

它引发以下错误

python pysparktask.py
    Traceback (most recent call last):
    File "pysparktask.py", line 1, in <module>
      from pyspark import SparkConf, SparkContext
    ModuleNotFoundError: No module named 'pyspark'

我尝试使用 pip 重新安装它.

I tried to install it again using pip.

pip install pyspark

它也给出了这个错误.

找不到满足pyspark要求的版本(来自版本:)找不到与pyspark匹配的分布

Could not find a version that satisfies the requirement pyspark (from versions: ) No matching distribution found for pyspark

编辑

根据答案,我更新了代码.

based on answer, I updated the code.

错误是

Traceback (most recent call last):
  File "pysparktask.py", line 6, in <module>
    from pyspark import SparkConf, SparkContext
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
    from pyspark.context import SparkContext
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
    from pyspark.java_gateway import launch_gateway
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'

推荐答案

添加以下环境变量,并将spark的lib路径附加到sys.path

Add the following environment variable and also append spark's lib path to sys.path

import os
import sys

os.environ['SPARK_HOME'] = "/usr/lib/spark/"
sys.path.append("/usr/lib/spark/python/")

from pyspark import SparkConf, SparkContext # And then try to import SparkContext.

这篇关于无法导入SparkContext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆