在Windows上安装pyspark [英] installing pyspark on windows
问题描述
在安装之前,我有几个问题需要澄清.请耐心等待,因为我还是数据科学和安装软件包的新手.
I have a few questions which I would like to clarify before installation. Please bear with me as I am still new to data science and installation packages.
1)我可以在Windows上执行pip安装pyspark.当我尝试在下面运行示例脚本时,它告诉我未设置spark_home.我是否仍需要设置我的spark_home,该如何进行呢?我在网上引用的博客会从spark网站手动提取spark文件,然后再将它们放到spark_home和pythonpath中.但是,我认为使用pip install pyspark可以消除这种情况.
1) I can do a pip install pyspark on my windows. When I try to run a sample script below it tells me my spark_home not set. Do i need to set my spark_home still and how do I go about doing it? The blogs which I have referred online do a manual extraction of the spark files from the spark website and then later they have to put the spark_home and the pythonpath. However, I thought this was elimated with pip install pyspark.
import findspark
findspark.init()
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql('''select 'spark' as hello ''')
df.show()
2)对于intellij,在pyspark中安装并根据需要在1中进行设置后,我是否仍需要进行其他配置?
2) For intellij, do I still need to do additional configuation once i have installed in pyspark and set up as necessary in 1?
非常感谢.我再次道歉,请问一个愚蠢的问题.
Thank you so much. once again I do apologise and please excuse if I ask a silly question.
推荐答案
在此处查看说明
https://medium.com/@GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c
您还需要安装Apache Spark(整个程序)!
you'll need to install Apache Spark (the whole thing) too!
我做到了,这花了很长时间-在学习/帮助朋友的大部分时间里,我会在 Zepl 或数据砖
I did it and it takes a good while - for the most part when I'm learning/helping a friend I'll use the notebooks at Zepl or databricks
如果您确实选择安装整个设备并且遇到麻烦,请不要害羞地提出另一个问题:)
if you do choose to install the whole thing and have trouble don't be shy to post another question :)
这篇关于在Windows上安装pyspark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!