如何使用 jupyter notebook 运行 pyspark? [英] How do I run pyspark with jupyter notebook?

查看:143
本文介绍了如何使用 jupyter notebook 运行 pyspark?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在控制台中运行命令 pyspark 时,我试图启动 jupyter 笔记本.当我现在输入它时,它只会在控制台中启动和交互式 shell.但是,这不方便键入长代码行.有没有办法将 jupyter notebook 连接到 pyspark shell?谢谢.

解决方案

我假设您已经安装了 spark 和 jupyter 笔记本,并且它们可以相互独立地完美运行.

如果是这种情况,请按照以下步骤操作,您应该能够启动带有 (py)spark 后端的 jupyter 笔记本.

  1. 转到您的 spark 安装文件夹,那里应该有一个 bin 目录:/path/to/spark/bin

  2. 创建一个文件,我们称之为start_pyspark.sh

  3. 打开 start_pyspark.sh 并编写如下内容:

    <前>#!/bin/bash

    export PYSPARK_PYTHON=/path/to/anaconda3/bin/pythonexport PYSPARK_DRIVER_PYTHON=/path/to/anaconda3/bin/jupyterexport PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"pyspark$@"

/path/to ... 分别替换为您安装 python 和 jupyter 二进制文件的路径.

  1. 这一步很可能已经完成,但以防万一
    通过添加以下行

    修改您的 ~/.bashrc 文件<前>#火花导出路径="/path/to/spark/bin:/path/to/spark/sbin:$PATH"export SPARK_HOME="/path/to/spark"导出 SPARK_CONF_DIR="/path/to/spark/conf"

运行 source ~/.bashrc 就设置好了.

继续尝试 start_pyspark.sh.
您还可以为脚本提供参数,例如start_pyspark.sh --packages dibbhatt:kafka-spark-consumer:1.0.14.

希望对你有用.

I am trying to fire the jupyter notebook when I run the command pyspark in the console. When I type it now, it only starts and interactive shell in the console. However, this is not convenient to type long lines of code. Is there are way to connect the jupyter notebook to pyspark shell? Thanks.

解决方案

I'm assuming you already have spark and jupyter notebooks installed and they work flawlessly independent of each other.

If that is the case, then follow the steps below and you should be able to fire up a jupyter notebook with a (py)spark backend.

  1. Go to your spark installation folder and there should be a bin directory there: /path/to/spark/bin

  2. Create a file, let's call it start_pyspark.sh

  3. Open start_pyspark.sh and write something like:

        #!/bin/bash
    
    

    export PYSPARK_PYTHON=/path/to/anaconda3/bin/python
    export PYSPARK_DRIVER_PYTHON=/path/to/anaconda3/bin/jupyter
    export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
    
    pyspark "$@"
    

Replace the /path/to ... with the path where you have installed your python and jupyter binaries respectively.

  1. Most probably this step is already done, but just in case
    Modify your ~/.bashrc file by adding the following lines

        # Spark
        export PATH="/path/to/spark/bin:/path/to/spark/sbin:$PATH"
        export SPARK_HOME="/path/to/spark"
        export SPARK_CONF_DIR="/path/to/spark/conf"
    

Run source ~/.bashrc and you are set.

Go ahead and try start_pyspark.sh.
You could also give arguments to the script, something like start_pyspark.sh --packages dibbhatt:kafka-spark-consumer:1.0.14.

Hope it works out for you.

这篇关于如何使用 jupyter notebook 运行 pyspark?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆