为初学者安装pyspark [英] Install pyspark for beginner
问题描述
我目前正在数据营地学习 pyspark 课程,现在想开始尝试使用 pyspark 在我自己的计算机上构建一些我自己的项目.但是,我对 spark/pysaprk 本身的安装以及如何在 jypter notebook 中运行它感到非常困惑.
I am currently doing pyspark courses in data camp, and now would like to start trying to build some of my own projects on my own computer using pyspark. However, I am becoming massively confused with the installation of spark/pysaprk itself and how to run it in jypter notebook.
我在 youtube 上看过有关安装的视频,例如 edurkea,它似乎通过创建 vm 机器并将其连接到另一台机器来进行安装,我不想要的只是在本地笔记本电脑上安装 pysaprk.
I have looked vids on youtube with regards to install, like edurkea which seems to give an installation by creating a vm machine and connecting it to another which I do not want all I want is to install pysaprk on my laptop locally.
我也按照此链接中的安装说明进行操作:
I have also followed the installation instructions from this link :
https://medium.com/@brajendragouda/installing-apache-spark-on-ubuntu-pyspark-on-juputer-ca8e40e8e655
当我在终端中运行命令 pyspark 时,我得到了无命令响应.
And when I run the command pyspark in my terminal I get the no command response.
我查看了 spark 站点上的文档,我发现该站点对新手不太友好,想知道是否有人提供指向此安装的易于遵循的指南的链接.
I have looked at the documentation on the spark site, which I find not very newbie friendly and was wondering if anyone has a link to an easy to follow guide for this install.
我当前的操作系统是 ubuntu 的最新版本,目前我只是在学习如何使用 shell 和 bash 脚本,但这一切都很新,我一直在看的很多东西开始让我感到困惑.
My current OS is ubuntu the latest version, I am just learning at the moment about using shell and bash scripts at the present but it all very new and a lot of the stuff I been looking at is starting to confuse me.
任何链接,建议都会非常有用.
Any links, advice would be much appreactied.
推荐答案
有一个 docker pyspark 图像 使设置变得非常简单.这是描述设置过程的链接.安装 docker 后运行,输入以下命令行将启动一个 jupyter notebook 环境,您可以在其中运行 pyspark docker run -it -p 8888:8888 jupyter/pyspark-notebook
.
There is a docker pyspark image that makes the setup pretty easy. Here's a link describing the setup process. With docker installed & running, entering the following command line will launch a jupyter notebook environment in which you can run pyspark docker run -it -p 8888:8888 jupyter/pyspark-notebook
.
这个命令将挂载一个临时文件系统,这使得读取/保存数据变得困难.要将环境指向您的文件系统,请运行 docker run -it --rm -p 8888:8888 -p 4040:4040 -p 4041:4041 -v/Users/your/path:/home/jovyan jupyter/pyspark-笔记本
This command will mount a temporary filesystem, though, which makes reading/saving data difficult. To point the environment to your filesystem, run docker run -it --rm -p 8888:8888 -p 4040:4040 -p 4041:4041 -v /Users/your/path:/home/jovyan jupyter/pyspark-notebook
这篇关于为初学者安装pyspark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!