如何使用Jupyter Notebook运行pyspark? [英] How do I run pyspark with jupyter notebook?

查看:706
本文介绍了如何使用Jupyter Notebook运行pyspark?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在控制台中运行命令pyspark时,我正在尝试启动jupyter笔记本.当我现在键入它时,它仅在控制台中启动并且是交互式shell.但是,这不方便键入长行代码.有没有办法将jupyter笔记本连接到pyspark shell?谢谢.

I am trying to fire the jupyter notebook when I run the command pyspark in the console. When I type it now, it only starts and interactive shell in the console. However, this is not convenient to type long lines of code. Is there are way to connect the jupyter notebook to pyspark shell? Thanks.

推荐答案

我假设您已经安装了spark和jupyter笔记本,它们可以完美地相互独立工作.

I'm assuming you already have spark and jupyter notebooks installed and they work flawlessly independent of each other.

如果是这种情况,请按照以下步骤操作,您应该能够启动具有(py)spark后端的jupyter笔记本电脑.

If that is the case, then follow the steps below and you should be able to fire up a jupyter notebook with a (py)spark backend.

  1. 转到您的spark安装文件夹,那里应该有一个bin目录: /path/to/spark/bin

创建文件,我们称它为start_pyspark.sh

Create a file, let's call it start_pyspark.sh

打开start_pyspark.sh并输入类似内容:


    #!/bin/bash

export PYSPARK_PYTHON=/path/to/anaconda3/bin/python
export PYSPARK_DRIVER_PYTHON=/path/to/anaconda3/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"

pyspark "$@"

/path/to ...替换为分别安装python和jupyter二进制文件的路径.

Replace the /path/to ... with the path where you have installed your python and jupyter binaries respectively.

  1. 此步骤很可能已经完成,但以防万一
    通过添加以下行

  1. Most probably this step is already done, but just in case
    Modify your ~/.bashrc file by adding the following lines


    # Spark
    export PATH="/path/to/spark/bin:/path/to/spark/sbin:$PATH"
    export SPARK_HOME="/path/to/spark"
    export SPARK_CONF_DIR="/path/to/spark/conf"

运行source ~/.bashrc,您已设置好.

继续尝试start_pyspark.sh.
您还可以为脚本提供参数,例如 start_pyspark.sh --packages dibbhatt:kafka-spark-consumer:1.0.14.

Go ahead and try start_pyspark.sh.
You could also give arguments to the script, something like start_pyspark.sh --packages dibbhatt:kafka-spark-consumer:1.0.14.

希望它对您有用.

这篇关于如何使用Jupyter Notebook运行pyspark?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆