火花异常:worker中的Python版本与驱动程序3.5中的版本不同 [英] Spark Exception: Python in worker has different version 3.4 than that in driver 3.5

查看:365
本文介绍了火花异常:worker中的Python版本与驱动程序3.5中的版本不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Amazon EC2,并且我拥有主服务器和开发服务器.我还有一个单独的工人实例.

I am using Amazon EC2, and I have my master and development servers as one. And I have another instance for a single worker.

这是我的新手,但我设法使Spark在独立模式下工作.现在,我正在尝试群集. master和worker处于活动状态(我可以看到他们的webUI并且它们正在运行).

I am new to this, but I have managed to make spark work in a standalone mode. Now I am trying cluster. the master and worker are active (I can see the webUI for them and they are functioning).

我有Spark 2.0,并且已经安装了Python 3.5.2随附的最新的Anaconda 4.1.1.在worker和master中,如果我去pyspark并执行os.version_info,我将得到3.5.2,我还正确设置了所有环境变量(如在stackoverflow和google上的其他帖子中所见)(例如,PYSPARK_PYTHON)

I have Spark 2.0, and I have installed the latest Anaconda 4.1.1 which comes with Python 3.5.2. In both worker and master, if I go to pyspark and do os.version_info, I will get the 3.5.2, I also have set all the environment variables correctly (as seen in other posts on stackoverflow and google) (e.g., PYSPARK_PYTHON).

无论如何都没有python的3.4版本.所以我想知道如何解决此问题.

There is no 3.4 version of python anywhere anyways. So I am wondering how I can fix this.

我通过运行以下命令得到错误:

I get the error by running this command:

rdd = sc.parallelize([1,2,3])
rdd.count()    

count()方法发生错误:

error happens for the count() method:

16/08/13 18:44:31 ERROR Executor: Exception in task 1.0 in stage 2.0 (TID 17)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/08/13 18:44:31 ERROR Executor: Exception in task 1.1 in stage 2.0 (TID 18)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

推荐答案

由于您已经在使用Anaconda,因此可以简单地使用所需的Python版本创建环境:

Since you already use Anaconda you can simply create an environment with the desired Python version:

conda create --name foo python=3.4
source activate foo

python --version
## Python 3.4.5 :: Continuum Analytics, Inc

并将其用作PYSPARK_DRIVER_PYTHON:

export PYSPARK_DRIVER_PYTHON=/path/to/anaconda/envs/foo/bin/python

这篇关于火花异常:worker中的Python版本与驱动程序3.5中的版本不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆