链接星火与IPython的笔记本 [英] Link Spark with iPython Notebook

查看:285
本文介绍了链接星火与IPython的笔记本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我按照一些教程在线,但他们不与星火1.5.1 在OS X埃尔卡皮坦(10.11)

I have followed some tutorial online but they do not work with Spark 1.5.1 on OS X El Capitan (10.11)

基本上我已经运行此命令下载 Apache的火花

Basically I have run this commands download apache-spark

brew update
brew install scala
brew install apache-spark

更新的.bash_profile

updated the .bash_profile

# For a ipython notebook and pyspark integration
if which pyspark > /dev/null; then
  export SPARK_HOME="/usr/local/Cellar/apache-spark/1.5.1/libexec/"
  export PYSPARK_SUBMIT_ARGS="--master local[2]"
fi

运行

ipython profile create pyspark

创建一个启动文件〜/ .ipython / profile_pyspark /启动/ 00-pyspark-setup.py 以这种方式配置

# Configure the necessary Spark environment
import os
import sys

# Spark home
spark_home = os.environ.get("SPARK_HOME")

# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
    if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

# Add the spark python sub-directory to the path
sys.path.insert(0, spark_home + "/python")

# Add the py4j to the path.
# You may need to change the version number to match your install
sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.8.2.1-src.zip"))

# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, "python/pyspark/shell.py"))

我然后运行 IPython的笔记本--profile = pyspark 和笔记本电脑工作正常,但 SC (火花上下文)不被认可。

I then run ipython notebook --profile=pyspark and the notebook works fine, but the sc (spark context) is not recognised.

任何管理与星火1.5.1

编辑:您可以按照本指南有工作

you can follow this guide to have it working

<一个href=\"https://gist.github.com/tommycarpi/f5a67c66a8f2170e263c\">https://gist.github.com/tommycarpi/f5a67c66a8f2170e263c

推荐答案

我已经安装Jupyter,确实是简单的比你想象的:

I have Jupyter installed, and indeed It is simpler than you think:


  1. 为 OSX。

  2. jupyter 打字终端的Click我要更多信息。

  1. Install anaconda for OSX.
  2. Install jupyter typing the next line in your terminal Click me for more info.

ilovejobs@mymac:~$ conda install jupyter


  • 更​​新jupyter以防万一。

  • Update jupyter just in case.

    ilovejobs@mymac:~$ conda update jupyter
    


  • 星火并编译它,或者下载并uncom preSS的Apache星火1.5.1 + Hadoop的2.6

  • Download Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.

    ilovejobs@mymac:~$ cd Downloads 
    ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz
    


  • 创建你家的应用文件夹(即):

    ilovejobs@mymac:~/Downloads$ mkdir ~/Apps
    


  • 移动uncom pressed文件夹火花1.5.1 〜/应用程序目录。

    ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps
    


  • 转到〜/应用程序目录并验证火花是存在的。

  • Move to the ~/Apps directory and verify that spark is there.

    ilovejobs@mymac:~/Downloads$ cd ~/Apps
    ilovejobs@mymac:~/Apps$ ls -l
    drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1
    


  • 这是第一个棘手的部分的。火花二进制文件添加到您的 $ PATH

  • Here is the first tricky part. Add the spark binaries to your $PATH:

    ilovejobs@mymac:~/Apps$ cd
    ilovejobs@mymac:~$ echo "export PATH=/home/ilovejobs/Apps/spark/bin" >> .profile
    


  • 这是第二个棘手的部分的。增加这个环境变量也:

  • Here is the second tricky part. Add this environment variables also:

    ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
    ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile
    


  • 来源的个人资料,以供该这些变量的终端

    ilovejobs@mymac:~$ source .profile
    


  • 创建一个〜/笔记本电脑目录。

    ilovejobs@mymac:~$ mkdir notebooks
    


  • 移至〜/笔记本电脑并运行pyspark:

  • Move to ~/notebooks and run pyspark:

    ilovejobs@mymac:~$ cd notebooks
    ilovejobs@mymac:~/notebooks$ pyspark
    


  • 请注意,您可以添加这些变量到位于家中的的.bashrc
    现在是幸福的,你应该能够与pyspark内核上运行jupyter(它会显示它作为一个Python 2,但它会用火花)

    Notice that you can add those variables to the .bashrc located in your home. Now be happy, You should be able to run jupyter with a pyspark kernel (It will show it as a python 2 but it will use spark)

    这篇关于链接星火与IPython的笔记本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆