PySpark是否可以在没有Spark的情况下工作? [英] Can PySpark work without Spark?

查看:849
本文介绍了PySpark是否可以在没有Spark的情况下工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用以下方式独立/本地(在Windows上)安装了PySpark:

I have installed PySpark standalone/locally (on Windows) using

pip install pyspark

令我有些惊讶的是,我已经可以在命令行中运行pyspark或在Jupyter Notebook中使用它,并且它不需要正确的Spark安装(例如,我不必执行本教程中的大多数步骤 https://medium.com/@GalarnykMichael/install-spark-on- windows-pyspark-4498a5d8d66c ).

I was a bit surprised I can already run pyspark in command line or use it in Jupyter Notebooks and that it does not need a proper Spark installation (e.g. I did not have to do most of the steps in this tutorial https://medium.com/@GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c ).

我遇到的大多数教程都需要在安装PySpark之前先安装Spark".这与我的观点有关,即PySpark基本上是Spark的包装.但也许我在这里错了-有人可以解释:

Most of the tutorials that I run into say one needs to "install Spark before installing PySpark". That would agree with my view of PySpark being basically a wrapper over Spark. But maybe I am wrong here - can someone explain:

  • 这两种技术之间的确切联系是什么?
  • 为什么安装PySpark足以使其运行?它实际上在引擎盖下安装了Spark吗?如果是,在哪里?
  • 如果仅安装PySpark,那么您会错过什么吗(例如,我找不到包含例如用于启动历史记录服务器的脚本的sbin文件夹)
  • what is the exact connection between these two technologies?
  • why is installing PySpark enough to make it run? Does it actually install Spark under the hood? If yes, where?
  • if you install only PySpark, is there something you miss (e.g. I cannot find the sbin folder which contains e.g. script to start history server)

推荐答案

As of v2.2, executing pip install pyspark will install Spark.

如果您要使用Pyspark,显然这是最简单的入门方法.

If you're going to use Pyspark it's clearly the simplest way to get started.

在我的系统上,Spark安装在lib/python3.6/site-packages/pyspark/jars

On my system Spark is installed inside my virtual environment (miniconda) at lib/python3.6/site-packages/pyspark/jars

这篇关于PySpark是否可以在没有Spark的情况下工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆