在纱线群集模式下,Spark作业失败 [英] Spark job fails in yarn-cluster mode
问题描述
我的作业在yarn-client模式下可以完美地运行,但是在yarn-cluster模式下无法运行,并指出错误"文件不存在:hdfs://192.xxx.xx:port/user/hduser/.sparkStaging/application_1442810383301_0016/pyspark.zip ". 虽然表明它已将文件上传到以上目录! 可能是什么原因?
My job runs perfectly in spark on the yarn-client mode , but fails on the yarn-cluster mode, stating the error "File does not exist: hdfs://192.xxx.x.x:port/user/hduser/.sparkStaging/application_1442810383301_0016/pyspark.zip". Although it shows that it has uploaded the file to the above directory!! What could be the cause??
这是完整的错误日志:
Application application_1449548654695_0003 failed 2 times due to AM Container for appattempt_1449548654695_0003_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://server1:8088/cluster/app/application_1449548654695_0003Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://192.168.0.109:54310/user/hduser/.sparkStaging/application_1449548654695_0003/pyspark.zip
java.io.FileNotFoundException: File does not exist: hdfs://192.168.0.109:54310/user/hduser/.sparkStaging/application_1449548654695_0003/pyspark.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.
推荐答案
对我来说,添加设置Hadoop配置文件无济于事:
HADOOP_CONF_DIR=/etc/hadoop
YARN_CONF_DIR =/etc/Hadoop
相反,关键是必须在Python的SparkConf
中设置spark.hadoop.fs.defaultFS
.
下面是我的代码.在运行它之前,我为资源管理器和HDFS文件系统的主机设置了环境变量.
For me, add setting the Hadoop config files didn’t help:
HADOOP_CONF_DIR=/etc/hadoop
YARN_CONF_DIR =/etc/Hadoop
Instead, the key was that spark.hadoop.fs.defaultFS
must be set in SparkConf
inside Python.
Below is my code. Before I run it, I set my environment variables for the host of the resource manager and the HDFS file system.
from pyspark import SparkConf, SparkContext
def test():
print('Hello world')
if __name__ == '__main__':
_app_name = "DemoApp"
# I define these environment variables before calling
# e.g., HADOOP_RM_HOST='myhost.edu'
_rm_host = os.environ['HADOOP_RM_HOST']
_fs_host = os.environ['HADOOP_FS_HOST']
# It's written that these environment variables should be set, but don't do anything for my Python
# Adding the core-site.xml, yarn-site.xml etc. to Python path doesn't do anything for my Python
# HADOOP_CONF_DIR=/etc/hadoop
# YARN_CONF_DIR =/etc/hadoop
# Run without Yarn, max threads
local_conf = SparkConf().setAppName(_app_name) \
.setMaster("local[*]")
# If you have bad substitution error: https://medium.com/@o20021106/run-pyspark-on-yarn-c7cd04b87d81
# There must be a hdfs://user/ID directory for the ID that this runs under (owned by ID)
# https://www.youtube.com/watch?v=dN60fkxABZs
# spark.hadoop.fs.defaultFS is required so that the files will be copied to the cluster
# If the cluster doesn't dynamically allocate executors, then .set("spark.executor.instances", "4")
yarn_conf = SparkConf().setAppName(_app_name) \
.setMaster("yarn") \
.set("spark.executor.memory", "4g") \
.set("spark.hadoop.fs.defaultFS", "hdfs://{}:8020".format(_fs_host)) \
.set("spark.hadoop.yarn.resourcemanager.hostname", _rm_host)\
.set("spark.hadoop.yarn.resourcemanager.address", "{}:8050".format(_rm_host))
sc = SparkContext(conf=yarn_conf)
test()
这篇关于在纱线群集模式下,Spark作业失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!