如何在 python 或 jupyter notebook 中使用 spark [英] how to use spark with python or jupyter notebook

查看:205
本文介绍了如何在 python 或 jupyter notebook 中使用 spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 python 中处理 12GB 的数据,为此我迫切需要使用 Spark ,但我想我太愚蠢了,无法自己或使用互联网使用命令行,这就是为什么我想我必须转向 SO ,

I am trying to work with 12GB of data in python for which I desperately need to use Spark , but I guess I'm too stupid to use command line by myself or by using internet and that is why I guess I have to turn to SO ,

到目前为止,我已经下载了 spark 并解压缩了 tar 文件或其他任何内容(抱歉语言不通,但我觉得自己很愚蠢)但现在我无处可去.我已经看到了 spark 网站文档的说明,它说:

So by far I have downloaded the spark and unzipped the tar file or whatever that is ( sorry for the language but I am feeling stupid and out ) but now I can see nowhere to go. I have seen the instruction on spark website documentation and it says :

Spark 还提供了 Python API.要在 Python 解释器中以交互方式运行 Spark,请使用 bin/pyspark 但在哪里执行此操作?请帮忙.我使用的是 Windows 10

Spark also provides a Python API. To run Spark interactively in a Python interpreter, use bin/pyspark but where to do this ? please please help . Edit : I am using windows 10

注意:我在尝试安装某些东西时总是遇到问题,主要是因为我似乎无法理解命令提示符

Note:: I have always faced problems when trying to install something mainly because I can't seem to understand Command prompt

推荐答案

如果你对jupyter notebook比较熟悉,可以安装Apache Toree 将 pyspark、scala、sql 和 SparkR 内核与 Spark 集成.

If you are more familiar with jupyter notebook, you can install Apache Toree which integrates pyspark,scala,sql and SparkR kernels with Spark.

用于安装 toree

pip install toree
jupyter toree install --spark_home=path/to/your/spark_directory --interpreters=PySpark

如果你想安装其他内核,你可以使用

if you want to install other kernels you can use

jupyter toree install --interpreters=SparkR,SQl,Scala   

现在运行

jupyter notebook

在选择新笔记本时的 UI 中,您应该会看到以下可用内核

In the UI while selecting new notebook, you should see following kernels availble

Apache Toree-Pyspark
Apache Toree-SparkR
Apache Toree-SQL
Apache Toree-Scala

这篇关于如何在 python 或 jupyter notebook 中使用 spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆