将 Spark 与 iPython Notebook 连接起来 [英] Link Spark with iPython Notebook
问题描述
我在网上学习了一些教程,但它们不适用于 OS X El Capitan (10.11) 上的 Spark 1.5.1
I have followed some tutorial online but they do not work with Spark 1.5.1
on OS X El Capitan (10.11)
基本上我已经运行了这个命令下载 apache-spark
Basically I have run this commands download apache-spark
brew update
brew install scala
brew install apache-spark
更新了 .bash_profile
updated the .bash_profile
# For a ipython notebook and pyspark integration
if which pyspark > /dev/null; then
export SPARK_HOME="/usr/local/Cellar/apache-spark/1.5.1/libexec/"
export PYSPARK_SUBMIT_ARGS="--master local[2]"
fi
运行
ipython profile create pyspark
创建了这样配置的启动文件~/.ipython/profile_pyspark/startup/00-pyspark-setup.py
created a startup file ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py
configured in this way
# Configure the necessary Spark environment
import os
import sys
# Spark home
spark_home = os.environ.get("SPARK_HOME")
# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in open(spark_release_file).read():
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
# Add the spark python sub-directory to the path
sys.path.insert(0, spark_home + "/python")
# Add the py4j to the path.
# You may need to change the version number to match your install
sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.8.2.1-src.zip"))
# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, "python/pyspark/shell.py"))
然后我运行 ipython notebook --profile=pyspark
并且笔记本工作正常,但 sc
(火花上下文)未被识别.
I then run ipython notebook --profile=pyspark
and the notebook works fine, but the sc
(spark context) is not recognised.
有人用 Spark 1.5.1
做到了这一点吗?
Anyone managed to do this with Spark 1.5.1
?
您可以按照本指南进行操作
you can follow this guide to have it working
https://gist.github.com/tommycarpi/f5a67c66a8f2170e263c
推荐答案
我已经安装了 Jupyter,确实比你想象的要简单:
I have Jupyter installed, and indeed It is simpler than you think:
- Install anaconda for OSX.
Install jupyter typing the next line in your terminal Click me for more info.
ilovejobs@mymac:~$ conda install jupyter
更新 jupyter 以防万一.
Update jupyter just in case.
ilovejobs@mymac:~$ conda update jupyter
下载 Apache Spark 并编译它,或者下载并解压缩 Apache Spark 1.5.1 + Hadoop 2.6.
Download Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.
ilovejobs@mymac:~$ cd Downloads
ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz
在您的家中创建一个 Apps
文件夹(即):
ilovejobs@mymac:~/Downloads$ mkdir ~/Apps
将解压后的文件夹 spark-1.5.1
移动到 ~/Apps
目录.
ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps
移动到 ~/Apps
目录并验证 spark 存在.
Move to the ~/Apps
directory and verify that spark is there.
ilovejobs@mymac:~/Downloads$ cd ~/Apps
ilovejobs@mymac:~/Apps$ ls -l
drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1
这是第一个棘手的部分.将 spark 二进制文件添加到您的 $PATH
:
Here is the first tricky part. Add the spark binaries to your $PATH
:
ilovejobs@mymac:~/Apps$ cd
ilovejobs@mymac:~$ echo "export $HOME/apps/spark/bin:$PATH" >> .profile
这是第二个棘手的部分.还要添加此环境变量:
Here is the second tricky part. Add this environment variables also:
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile
获取配置文件以使这些变量可用于此终端
ilovejobs@mymac:~$ source .profile
创建一个 ~/notebooks
目录.
ilovejobs@mymac:~$ mkdir notebooks
移动到 ~/notebooks
并运行 pyspark:
Move to ~/notebooks
and run pyspark:
ilovejobs@mymac:~$ cd notebooks
ilovejobs@mymac:~/notebooks$ pyspark
请注意,您可以将这些变量添加到位于您家中的 .bashrc
中.现在很高兴,您应该能够使用 pyspark 内核运行 jupyter(它将显示为 python 2,但它将使用 spark)
Notice that you can add those variables to the .bashrc
located in your home.
Now be happy, You should be able to run jupyter with a pyspark kernel (It will show it as a python 2 but it will use spark)
这篇关于将 Spark 与 iPython Notebook 连接起来的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!