为初学者安装pyspark [英] Install pyspark for beginner

查看:63
本文介绍了为初学者安装pyspark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在数据营地学习 pyspark 课程,现在想开始尝试使用 pyspark 在我自己的计算机上构建一些我自己的项目.但是,我对 spark/pysaprk 本身的安装以及如何在 jypter notebook 中运行它感到非常困惑.

I am currently doing pyspark courses in data camp, and now would like to start trying to build some of my own projects on my own computer using pyspark. However, I am becoming massively confused with the installation of spark/pysaprk itself and how to run it in jypter notebook.

我在 youtube 上看过有关安装的视频,例如 edurkea,它似乎通过创建 vm 机器并将其连接到另一台机器来进行安装,我不想要的只是在本地笔记本电脑上安装 pysaprk.

I have looked vids on youtube with regards to install, like edurkea which seems to give an installation by creating a vm machine and connecting it to another which I do not want all I want is to install pysaprk on my laptop locally.

我也按照此链接中的安装说明进行操作:

I have also followed the installation instructions from this link :

https://medium.com/@brajendragouda/installing-apache-spark-on-ubuntu-pyspark-on-juputer-ca8e40e8e655

当我在终端中运行命令 pyspark 时,我得到了无命令响应.

And when I run the command pyspark in my terminal I get the no command response.

我查看了 spark 站点上的文档,我发现该站点对新手不太友好,想知道是否有人提供指向此安装的易于遵循的指南的链接.

I have looked at the documentation on the spark site, which I find not very newbie friendly and was wondering if anyone has a link to an easy to follow guide for this install.

我当前的操作系统是 ubuntu 的最新版本,目前我只是在学习如何使用 shell 和 bash 脚本,但这一切都很新,我一直在看的很多东西开始让我感到困惑.

My current OS is ubuntu the latest version, I am just learning at the moment about using shell and bash scripts at the present but it all very new and a lot of the stuff I been looking at is starting to confuse me.

任何链接,建议都会非常有用.

Any links, advice would be much appreactied.

推荐答案

有一个 docker pyspark 图像 使设置变得非常简单.这是描述设置过程的链接.安装 docker 后运行,输入以下命令行将启动一个 jupyter notebook 环境,您可以在其中运行 pyspark docker run -it -p 8888:8888 jupyter/pyspark-notebook.

There is a docker pyspark image that makes the setup pretty easy. Here's a link describing the setup process. With docker installed & running, entering the following command line will launch a jupyter notebook environment in which you can run pyspark docker run -it -p 8888:8888 jupyter/pyspark-notebook.

这个命令将挂载一个临时文件系统,这使得读取/保存数据变得困难.要将环境指向您的文件系统,请运行 docker run -it --rm -p 8888:8888 -p 4040:4040 -p 4041:4041 -v/Users/your/path:/home/jovyan jupyter/pyspark-笔记本

This command will mount a temporary filesystem, though, which makes reading/saving data difficult. To point the environment to your filesystem, run docker run -it --rm -p 8888:8888 -p 4040:4040 -p 4041:4041 -v /Users/your/path:/home/jovyan jupyter/pyspark-notebook

这篇关于为初学者安装pyspark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆