在 AWS EMR 上“搜索时找不到有效的 SPARK_HOME" [英] 'Could not find valid SPARK_HOME while searching' on AWS EMR

查看:35
本文介绍了在 AWS EMR 上“搜索时找不到有效的 SPARK_HOME"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用 spark submit 命令在 EMR 集群上运行 python 脚本时,进程卡在 10%(可以通过 yarn application --list 看到)和当我检查日志时,所有内核执行者都显示了以下类型的消息,因为最近出现错误:

While running a python script on an EMR cluster using the spark submit command the process got stuck on 10% (can be seen through yarn application --list) and when I examined the logs, all cores executers presented the following type of message as there recent error:

Could not find valid SPARK_HOME while searching ['/mnt1/yarn/usercache/hadoop/appcache/application_x_0001', '/mnt/yarn/usercache/hadoop/filecache/11/pyspark.zip/pyspark', '/mnt1/yarn/usercache/hadoop/appcache/application_x_0001/container_x_0001_01_000002/pyspark.zip/pyspark', '/mnt1/yarn/usercache/hadoop/appcache/application_x_0001/container_x_0001_01_000002']

代码在本地运行良好,并且由于所有内核上都安装了 Spark,我无法弄清楚此问题的原因以及如何解决此错误.除了葡萄牙语中的一篇帖子外,没有明确的答案我无法回答查找任何提供此问题解决方案的帖子.

The code ran well localy, and since Spark was installed on all cores, I couldn't figure what is the cause for this issue and how to solve this error. Beside one post in Portuguese, without a clear answer i couldn't find any post with a solution for this issue.

推荐答案

最后我发现这个错误的原因是试图从运行在内核上并且不属于主脚本的函数调用 spark 上下文对象,而它已经在主脚本中创建.显然是以下命令

Finally I discovered that the cause for this error is the attempt to call the spark context object from functions who ran on cores and aren't part of the main script, while it was already created in the main script. Apparently the following command

from pyspark import SparkContext
sc = SparkContext.getOrCreate()

创建一个新的 SparkContext 对象,即使它已经在主节点的主脚本中创建.因此,为了防止出现此问题,如果必须在不是主脚本的脚本中使用 sparkContext,则必须将其从主脚本显式导出/导入到侧脚本(例如,作为函数)以避免出现以下问题.

create a new SparkContext object even if it was already created in the main script on the master node. Therefor, in order to prevent this issue, in case sparkContext has to be used in a script which isn't the main script it has to be explicitly exported/imported from the main script to the side script (eg. as a parameter of a function) in order to avoid the following issue.

这篇关于在 AWS EMR 上“搜索时找不到有效的 SPARK_HOME"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆