Spark Job容器以exitCode:-1000退出 [英] Spark Job Container exited with exitCode: -1000
问题描述
我一直努力在纱线簇模式下使用spark 2.0.0运行示例作业,该作业存在exitCode:-1000,没有任何其他线索.同一作业在本地模式下可以正常运行.
I have been struggling to run sample job with spark 2.0.0 in yarn cluster mode, job exists with exitCode: -1000 without any other clues. Same job runs properly in local mode.
火花命令:
spark-submit \
--conf "spark.yarn.stagingDir=/xyz/warehouse/spark" \
--queue xyz \
--class com.xyz.TestJob \
--master yarn \
--deploy-mode cluster \
--conf "spark.local.dir=/xyz/warehouse/tmp" \
/xyzpath/java-test-1.0-SNAPSHOT.jar $@
TestJob类:
public class TestJob {
public static void main(String[] args) throws InterruptedException {
SparkConf conf = new SparkConf();
JavaSparkContext jsc = new JavaSparkContext(conf);
System.out.println(
"TOtal count:"+
jsc.parallelize(Arrays.asList(new Integer[]{1,2,3,4})).count());
jsc.stop();
}
}
错误日志:
17/10/04 22:26:52 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED)
17/10/04 22:26:52 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.xyz
start time: 1507181210893
final status: UNDEFINED
tracking URL: http://xyzserver:8088/proxy/application_1506717704791_130756/
user: xyz
17/10/04 22:26:53 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED)
17/10/04 22:26:54 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED)
17/10/04 22:26:55 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED)
17/10/04 22:26:56 INFO Client: Application report for application_1506717704791_130756 (state: FAILED)
17/10/04 22:26:56 INFO Client:
client token: N/A
diagnostics: Application application_1506717704791_130756 failed 5 times due to AM Container for appattempt_1506717704791_130756_000005 exited with exitCode: -1000
For more detailed output, check application tracking page:http://xyzserver:8088/cluster/app/application_1506717704791_130756Then, click on links to logs of each attempt.
Diagnostics: Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.xyz
start time: 1507181210893
final status: FAILED
tracking URL: http://xyzserver:8088/cluster/app/application_1506717704791_130756
user: xyz
17/10/04 22:26:56 INFO Client: Deleted staging directory /xyz/spark/.sparkStaging/application_1506717704791_130756
Exception in thread "main" org.apache.spark.SparkException: Application application_1506717704791_130756 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213)
当我浏览页面http://xyzserver:8088/cluster/app/application_1506717704791_130756
时,该页面不存在.
When I browse the page http://xyzserver:8088/cluster/app/application_1506717704791_130756
it doesn't exists.
未找到纱线应用程序日志-
$yarn logs -applicationId application_1506717704791_130756
/apps/yarn/logs/xyz/logs/application_1506717704791_130756 does not have any log files.
此错误的可能根本原因是什么?如何获取详细的错误日志?
What could be the possibly rootcause of this error and how to get detailed error logs?
推荐答案
花了整整一天的时间后,我才发现了根本原因.当我删除spark.yarn.stagingDir
时,它开始工作,但我仍然不确定为什么Spark抱怨它-
After spending nearly one whole day I found the rootcause. When I remove spark.yarn.stagingDir
it starts working and I am still not sure why spark is complaining about it-
上一个Spark提交-
spark-submit \
--conf "spark.yarn.stagingDir=/xyz/warehouse/spark" \
--queue xyz \
--class com.xyz.TestJob \
--master yarn \
--deploy-mode cluster \
--conf "spark.local.dir=/xyz/warehouse/tmp" \
/xyzpath/java-test-1.0-SNAPSHOT.jar $@
新
spark-submit \
--queue xyz \
--class com.xyz.TestJob \
--master yarn \
--deploy-mode cluster \
--conf "spark.local.dir=/xyz/warehouse/tmp" \
/xyzpath/java-test-1.0-SNAPSHOT.jar $@
这篇关于Spark Job容器以exitCode:-1000退出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!