PySpark SparkContext名称错误'sc'在jupyter中 [英] PySpark SparkContext Name Error 'sc' in jupyter

查看:749
本文介绍了PySpark SparkContext名称错误'sc'在jupyter中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是pyspark的新手,想在我的Ubuntu 12.04机器上使用Ipython笔记本使用pyspark。以下是pyspark和Ipython笔记本的配置。

I am new for pyspark and want to use pyspark using Ipython notebook in my Ubuntu 12.04 machine. Below are the configuration for pyspark and Ipython notebook.

sparkuser@Ideapad:~$ echo $JAVA_HOME
/usr/lib/jvm/java-8-oracle

# Path for Spark
sparkuser@Ideapad:~$ ls /home/sparkuser/spark/
bin    CHANGES.txt  data  examples  LICENSE   NOTICE  R          RELEASE  scala-2.11.6.deb
build  conf         ec2   lib       licenses  python  README.md  sbin     spark-1.5.2-bin-hadoop2.6.tgz

我安装了Anaconda2 4.0.0和anaconda路径:

I installed Anaconda2 4.0.0 and path for anaconda:

sparkuser@Ideapad:~$ ls anaconda2/
bin  conda-meta  envs  etc  Examples  imports  include  lib  LICENSE.txt  mkspecs  pkgs  plugins  share  ssl  tests

为IPython创建PySpark配置文件。

Create PySpark Profile for IPython.

ipython profile create pyspark

sparkuser@Ideapad:~$ cat .bashrc

export SPARK_HOME="$HOME/spark"
export PYSPARK_SUBMIT_ARGS="--master local[2]"
# added by Anaconda2 4.0.0 installer
export PATH="/home/sparkuser/anaconda2/bin:$PATH"

创建一个名为〜/ .ipython / profile_pyspark / startup / 00-pyspark-setup.py的文件:

Create a file named ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py:

sparkuser@Ideapad:~$ cat .ipython/profile_pyspark/startup/00-pyspark-setup.py 
import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))

filename = os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))

spark_release_file = spark_home + "/RELEASE"

if os.path.exists(spark_release_file) and "Spark 1.5.2" in open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
    if not "pyspark-shell" in pyspark_submit_args: 
        pyspark_submit_args += " pyspark-shell"
        os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

登录到pyspark终端:

Logging in to pyspark terminal:

sparkuser@Ideapad:~$ ~/spark/bin/pyspark
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec  6 2015, 18:08:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/04/22 21:06:55 INFO SparkContext: Running Spark version 1.5.2
16/04/22 21:07:27 INFO BlockManagerMaster: Registered BlockManager
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.5.2
      /_/

Using Python version 2.7.11 (default, Dec  6 2015 18:08:32)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sc
<pyspark.context.SparkContext object at 0x7facb75b50d0>
>>>

当我运行以下命令时,会打开juypter浏览器

When I run the below command, a juypter browser is opened

sparkuser@Ideapad:~$ ipython notebook --profile=pyspark
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook`... continue in 5 sec. Press Ctrl-C to quit now.
[W 21:32:08.070 NotebookApp] Unrecognized alias: '--profile=pyspark', it will probably have no effect.
[I 21:32:08.111 NotebookApp] Serving notebooks from local directory: /home/sparkuser
[I 21:32:08.111 NotebookApp] 0 active kernels 
[I 21:32:08.111 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 21:32:08.111 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Created new window in existing browser session.

在浏览器中,如果我输入以下命令,则会抛出NameError。

In the browser if I type the following command, it is throwing NameError.

In [ ]: print sc
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-ee8101b8fe58> in <module>()
----> 1 print sc
NameError: name 'sc' is not defined

当我运行以上内容时在pyspark终端中的命令,它输出所需的输出,但是当我在jupyter中运行相同的命令时,它会抛出上述错误。

When I run the above command in pyspark terminal, it is outputting the required output, but when I run the same command in jupyter it is throwing the above error.

以上是pyspark的配置设置和Ipython。
如何用jupyter配置pyspark?

Above are the configuration settings of pyspark and Ipython. How to configure the pyspark with jupyter?

推荐答案

这是一个解决方法,我建议你试试没有取决于 pyspark 为您加载上下文: -

Here is one workaround, I would suggest that you to try without depending on pyspark to load context for you:-

!pip install findspark

然后只需导入并初始化sparkcontext: -

Then simply import and initialize the sparkcontext:-

import findspark
import os

findspark.init()

import pyspark

sc = pyspark.SparkContext()

参考: https://pypi.python.org/pypi/findspark

这篇关于PySpark SparkContext名称错误'sc'在jupyter中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆