遇到无法在 pyspark 上运行程序的错误 [英] encountered a ERROR that Can't run program on pyspark

查看:124
本文介绍了遇到无法在 pyspark 上运行程序的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 pyspark 上输入了这些命令

I entered these commands on pyspark

 In [1]: myrdd = sc.textFile("Cloudera-cdh5.repo")
 In [2]: myrdd.map(lambda x:x.upper()).collect()

当我执行'myrdd.map(lambda x:x.upper()).collect()'时,我遇到了一个错误

When i execute 'myrdd.map(lambda x:x.upper()).collect()',I encountered a ERROR

以下是错误信息

 Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, tiger): java.io.IOException: Cannot run program "/usr/local/bin/python3": error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
        at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:160)
        at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:73)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
        at java.lang.ProcessImpl.start(ProcessImpl.java:130)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
        ... 13 more

磁盘上存在文件/usr/local/bin/python3

The file /usr/local/bin/python3 is exist on the disk

我该如何解决上述错误?

How can i solve the above error?

推荐答案

需要给/usr/local/bin/python3这个路径的访问权限,可以使用命令sudochmod 777/usr/local/bin/python3/*.

you need to give access permission on /usr/local/bin/python3 this path, you can use command sudo chmod 777 /usr/local/bin/python3/*.

我认为这个问题是由变量 PYSPARK_PYTHON 引起的,它用于为每个节点指向 python 的位置,您可以在下面的命令中使用

I think this issue is occurred by variable PYSPARK_PYTHON, it is use to pointing python's location for every nodeyou can use below command

export PYSPARK_PYTHON=/usr/local/bin/python3

这篇关于遇到无法在 pyspark 上运行程序的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆