如何导入pyspark在蟒蛇 [英] How to import pyspark in anaconda
问题描述
我试图导入和使用 pyspark
与蟒蛇。
I am trying to import and use pyspark
with anaconda.
安装的火花,并设置 $ SPARK_HOME
变量我试过之后:
After installing spark, and setting the $SPARK_HOME
variable I tried:
$ pip install pyspark
因为我发现这不会(当然)工作,我需要电话蟒蛇寻找 pyspark
在 $ SPARK_HOME /蟒蛇/
。问题是,要做到这一点,我需要设置 $ PYTHONPATH
,而蟒蛇不使用环境变量。
This won't work (of course) because I discovered that I need to tel python to look for pyspark
under $SPARK_HOME/python/
. The problem is that to do that, I need to set the $PYTHONPATH
while anaconda don't use that environment variable.
我试过的 $ SPARK_HOME /蟒蛇/
的内容复制到 ANACONDA_HOME / lib中/ python2.7 /站点包/
,但它不会工作。
I tried to copy the content of $SPARK_HOME/python/
to ANACONDA_HOME/lib/python2.7/site-packages/
but it won't work.
请问有什么解决的蟒蛇使用pyspark?
Is there any solution to use pyspark in anaconda?
推荐答案
您可以简单地设置 PYSPARK_DRIVER_PYTHON
和 PYSPARK_PYTHON
环境变量,以使用根蟒蛇Python或特定的蟒蛇环境。例如:
You can simply set PYSPARK_DRIVER_PYTHON
and PYSPARK_PYTHON
environmental variables to use either root Anaconda Python or a specific Anaconda environment. For example:
export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
或
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/python
当您使用 $ SPARK_HOME /斌/ pyspark
/ $ SPARK_HOME /斌/火花提交
将选择一个正确的环境。只要记住,PySpark拥有所有计算机上的相同的Python版本。
When you use $SPARK_HOME/bin/pyspark
/ $SPARK_HOME/bin/spark-submit
it will choose a correct environment. Just remember that PySpark has to the same Python version on all machines.
在使用一个侧面说明 PYTHONPATH
应该工作得很好,即使不推荐。
On a side note using PYTHONPATH
should work just fine, even if it is not recommended.
这篇关于如何导入pyspark在蟒蛇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!