将 Spark 与 iPython Notebook 连接起来 [英] Link Spark with iPython Notebook

查看:50
本文介绍了将 Spark 与 iPython Notebook 连接起来的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在网上学习了一些教程,但它们不适用于 OS X El Capitan (10.11) 上的 Spark 1.5.1

I have followed some tutorial online but they do not work with Spark 1.5.1 on OS X El Capitan (10.11)

基本上我已经运行了这个命令下载 apache-spark

Basically I have run this commands download apache-spark

brew update
brew install scala
brew install apache-spark

更新了 .bash_profile

updated the .bash_profile

# For a ipython notebook and pyspark integration
if which pyspark > /dev/null; then
  export SPARK_HOME="/usr/local/Cellar/apache-spark/1.5.1/libexec/"
  export PYSPARK_SUBMIT_ARGS="--master local[2]"
fi

运行

ipython profile create pyspark

创建了这样配置的启动文件~/.ipython/profile_pyspark/startup/00-pyspark-setup.py

created a startup file ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py configured in this way

# Configure the necessary Spark environment
import os
import sys

# Spark home
spark_home = os.environ.get("SPARK_HOME")

# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
    if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

# Add the spark python sub-directory to the path
sys.path.insert(0, spark_home + "/python")

# Add the py4j to the path.
# You may need to change the version number to match your install
sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.8.2.1-src.zip"))

# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, "python/pyspark/shell.py"))

然后我运行 ipython notebook --profile=pyspark 并且笔记本工作正常,但 sc(火花上下文)未被识别.

I then run ipython notebook --profile=pyspark and the notebook works fine, but the sc (spark context) is not recognised.

有人用 Spark 1.5.1 做到了这一点吗?

Anyone managed to do this with Spark 1.5.1?

您可以按照本指南进行操作

you can follow this guide to have it working

https://gist.github.com/tommycarpi/f5a67c66a8f2170e263c

推荐答案

我已经安装了 Jupyter,确实比你想象的要简单:

I have Jupyter installed, and indeed It is simpler than you think:

  1. 为 OSX 安装 anaconda.
  2. 安装 jupyter 在终端中输入下一行 点击我了解更多信息.

  1. Install anaconda for OSX.
  2. Install jupyter typing the next line in your terminal Click me for more info.

ilovejobs@mymac:~$ conda install jupyter

  • 更新 jupyter 以防万一.

  • Update jupyter just in case.

    ilovejobs@mymac:~$ conda update jupyter
    

  • 下载 Apache Spark 并编译它,或者下载并解压缩 Apache Spark 1.5.1 + Hadoop 2.6.

  • Download Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.

    ilovejobs@mymac:~$ cd Downloads 
    ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz
    

  • 在您的家中创建一个 Apps 文件夹(即):

    ilovejobs@mymac:~/Downloads$ mkdir ~/Apps
    

  • 将解压后的文件夹 spark-1.5.1 移动到 ~/Apps 目录.

    ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps
    

  • 移动到 ~/Apps 目录并验证 spark 存在.

  • Move to the ~/Apps directory and verify that spark is there.

    ilovejobs@mymac:~/Downloads$ cd ~/Apps
    ilovejobs@mymac:~/Apps$ ls -l
    drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1
    

  • 这是第一个棘手的部分.将 spark 二进制文件添加到您的 $PATH:

  • Here is the first tricky part. Add the spark binaries to your $PATH:

    ilovejobs@mymac:~/Apps$ cd
    ilovejobs@mymac:~$ echo "export $HOME/apps/spark/bin:$PATH" >> .profile
    

  • 这是第二个棘手的部分.还要添加此环境变量:

  • Here is the second tricky part. Add this environment variables also:

    ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
    ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile
    

  • 获取配置文件以使这些变量可用于此终端

    ilovejobs@mymac:~$ source .profile
    

  • 创建一个 ~/notebooks 目录.

    ilovejobs@mymac:~$ mkdir notebooks
    

  • 移动到 ~/notebooks 并运行 pyspark:

  • Move to ~/notebooks and run pyspark:

    ilovejobs@mymac:~$ cd notebooks
    ilovejobs@mymac:~/notebooks$ pyspark
    

  • 请注意,您可以将这些变量添加到位于您家中的 .bashrc 中.现在很高兴,您应该能够使用 pyspark 内核运行 jupyter(它将显示为 python 2,但它将使用 spark)

    Notice that you can add those variables to the .bashrc located in your home. Now be happy, You should be able to run jupyter with a pyspark kernel (It will show it as a python 2 but it will use spark)

    这篇关于将 Spark 与 iPython Notebook 连接起来的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆