PySpark SparkContext名称错误'sc'在jupyter中 [英] PySpark SparkContext Name Error 'sc' in jupyter
问题描述
我是pyspark的新手,想在我的Ubuntu 12.04机器上使用Ipython笔记本使用pyspark。以下是pyspark和Ipython笔记本的配置。
I am new for pyspark and want to use pyspark using Ipython notebook in my Ubuntu 12.04 machine. Below are the configuration for pyspark and Ipython notebook.
sparkuser@Ideapad:~$ echo $JAVA_HOME
/usr/lib/jvm/java-8-oracle
# Path for Spark
sparkuser@Ideapad:~$ ls /home/sparkuser/spark/
bin CHANGES.txt data examples LICENSE NOTICE R RELEASE scala-2.11.6.deb
build conf ec2 lib licenses python README.md sbin spark-1.5.2-bin-hadoop2.6.tgz
我安装了Anaconda2 4.0.0和anaconda路径:
I installed Anaconda2 4.0.0 and path for anaconda:
sparkuser@Ideapad:~$ ls anaconda2/
bin conda-meta envs etc Examples imports include lib LICENSE.txt mkspecs pkgs plugins share ssl tests
为IPython创建PySpark配置文件。
Create PySpark Profile for IPython.
ipython profile create pyspark
sparkuser@Ideapad:~$ cat .bashrc
export SPARK_HOME="$HOME/spark"
export PYSPARK_SUBMIT_ARGS="--master local[2]"
# added by Anaconda2 4.0.0 installer
export PATH="/home/sparkuser/anaconda2/bin:$PATH"
创建一个名为〜/ .ipython / profile_pyspark / startup / 00-pyspark-setup.py的文件:
Create a file named ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py:
sparkuser@Ideapad:~$ cat .ipython/profile_pyspark/startup/00-pyspark-setup.py
import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
filename = os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.5.2" in open(spark_release_file).read():
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args:
pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
登录到pyspark终端:
Logging in to pyspark terminal:
sparkuser@Ideapad:~$ ~/spark/bin/pyspark
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec 6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/04/22 21:06:55 INFO SparkContext: Running Spark version 1.5.2
16/04/22 21:07:27 INFO BlockManagerMaster: Registered BlockManager
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.5.2
/_/
Using Python version 2.7.11 (default, Dec 6 2015 18:08:32)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sc
<pyspark.context.SparkContext object at 0x7facb75b50d0>
>>>
当我运行以下命令时,会打开juypter浏览器
When I run the below command, a juypter browser is opened
sparkuser@Ideapad:~$ ipython notebook --profile=pyspark
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook`... continue in 5 sec. Press Ctrl-C to quit now.
[W 21:32:08.070 NotebookApp] Unrecognized alias: '--profile=pyspark', it will probably have no effect.
[I 21:32:08.111 NotebookApp] Serving notebooks from local directory: /home/sparkuser
[I 21:32:08.111 NotebookApp] 0 active kernels
[I 21:32:08.111 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 21:32:08.111 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Created new window in existing browser session.
在浏览器中,如果我输入以下命令,则会抛出NameError。
In the browser if I type the following command, it is throwing NameError.
In [ ]: print sc
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-ee8101b8fe58> in <module>()
----> 1 print sc
NameError: name 'sc' is not defined
当我运行以上内容时在pyspark终端中的命令,它输出所需的输出,但是当我在jupyter中运行相同的命令时,它会抛出上述错误。
When I run the above command in pyspark terminal, it is outputting the required output, but when I run the same command in jupyter it is throwing the above error.
以上是pyspark的配置设置和Ipython。
如何用jupyter配置pyspark?
Above are the configuration settings of pyspark and Ipython. How to configure the pyspark with jupyter?
推荐答案
这是一个解决方法,我建议你试试没有取决于 pyspark
为您加载上下文: -
Here is one workaround, I would suggest that you to try without depending on pyspark
to load context for you:-
从
!pip install findspark
然后只需导入并初始化sparkcontext: -
Then simply import and initialize the sparkcontext:-
import findspark
import os
findspark.init()
import pyspark
sc = pyspark.SparkContext()
参考: https://pypi.python.org/pypi/findspark
这篇关于PySpark SparkContext名称错误'sc'在jupyter中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!