找不到Spark应用程序输出 [英] Can't find spark application output

查看:78
本文介绍了找不到Spark应用程序输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个可以成功启动的集群,至少在我看到此信息的Web UI上显示的是

I have a cluster that I can launch successfully, at least that's what appears on web UI in which I see this information

URL: spark://Name25:7077
REST URL: spark://Name25:6066 (cluster mode)
Alive Workers: 10
Cores in use: 192 Total, 0 Used
Memory in use: 364.0 GB Total, 0.0 B Used
Applications: 0 Running, 5 Completed
Drivers: 0 Running, 5 Completed
Status: ALIVE

如果我以这种方式使用它,我使用了Submit命令来运行我的应用程序

I used submit command to run my application, if I use it in this way

./bin/spark-submit --class myapp.Main --master spark://Name25:7077 --deploy-mode cluster /home/lookupjar/myapp-0.0.1-SNAPSHOT.jar /home/etud500.csv  /home/

我收到此消息:

使用REST应用程序提交协议运行Spark.使用Spark的默认log4j配置文件:org/apache/spark/log4j-defaults.properties16/08/31 15:55:16 INFO RestSubmissionClient:提交请求以启动spark://Name25:7077中的应用程序.16/08/31 15:55:27警告RestSubmissionClient:无法连接到服务器spark://Name25:7077.警告:主端点spark://Name25:7077不是REST服务器.而是回退到旧版提交网关.16/08/31 15:55:28 WARN NativeCodeLoader:无法为您的平台加载本机Hadoop库...在适用的情况下使用内置Java类

Running Spark using the REST application submission protocol. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/08/31 15:55:16 INFO RestSubmissionClient: Submitting a request to launch an application in spark://Name25:7077. 16/08/31 15:55:27 WARN RestSubmissionClient: Unable to connect to server spark://Name25:7077. Warning: Master endpoint spark://Name25:7077 was not a REST server. Falling back to legacy submission gateway instead. 16/08/31 15:55:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

如果我以这种方式使用它:

and if I use it in this way :

./bin/spark-submit --class myapp.Main --master spark://Name25:6066 --deploy-mode cluster /home/lookupjar/myapp-0.0.1-SNAPSHOT.jar /home//etud500.csv  /home/result

我收到此消息

使用REST应用程序提交协议运行Spark.使用Spark的默认log4j配置文件:org/apache/spark/log4j-defaults.properties16/08/31 16:59:06 INFO RestSubmissionClient:提交请求以启动spark://Name25:6066中的应用程序.16/08/31 16:59:06 INFO RestSubmissionClient:提交已成功创建为driver-20160831165906-0004.轮询提交状态...16/08/31 16:59:06 INFO RestSubmissionClient:在spark://Name25:6066中提交对提交驱动程序-20160831165906-0004的状态的请求.16/08/31 16:59:06 INFO RestSubmissionClient:驱动程序driver-20160831165906-0004的状态现在正在运行.16/08/31 16:59:06 INFO RestSubmissionClient:驱动程序在工人worker-20160831143117-10.0.10.48-38917上的10.0.10.48:38917运行.16/08/31 16:59:06 INFO RestSubmissionClient:服务器以CreateSubmissionResponse响应:{"action":"CreateSubmissionResponse","message":驱动程序已成功提交为driver-20160831165906-0004","serverSparkVersion":"2.0.0","submissionId":"driver-20160831165906-0004",成功":是}

Running Spark using the REST application submission protocol. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/08/31 16:59:06 INFO RestSubmissionClient: Submitting a request to launch an application in spark://Name25:6066. 16/08/31 16:59:06 INFO RestSubmissionClient: Submission successfully created as driver-20160831165906-0004. Polling submission state... 16/08/31 16:59:06 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160831165906-0004 in spark://Name25:6066. 16/08/31 16:59:06 INFO RestSubmissionClient: State of driver driver-20160831165906-0004 is now RUNNING. 16/08/31 16:59:06 INFO RestSubmissionClient: Driver is running on worker worker-20160831143117-10.0.10.48-38917 at 10.0.10.48:38917. 16/08/31 16:59:06 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse: { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20160831165906-0004", "serverSparkVersion" : "2.0.0", "submissionId" : "driver-20160831165906-0004", "success" : true }

我认为这是成功的,但是我的应用程序应该在给定路径(/home/result)上有3个输出,因为我在代码中使用过:

I think it's a success but my application should have 3 outputs to the given path (/home/result), because I used in my code :

path =args [1];
rdd1.saveAsTextFile(path+"/rdd1");
rdd2.saveAsTextFile(path+"/rdd2");
rdd3.saveAsTextFile(path+"/rdd3");

问题1:为什么它要求我使用"spark://Name25:6066"而不是"spark://Name25:7077"?因为根据Spark网站,我们使用:7077

Question 1 : Why does it ask me to use "spark://Name25:6066 " rather than "spark://Name25:7077 "? because according to spark website we use :7077

问题2:如果表明提交和完成的申请成功,为什么我找不到3个输出文件夹?

Question 2 : If it indicates success of submitting and completed applications, why don't I find the 3 output folders ?

推荐答案

使用6066提交并不表示您的作业已成功完成.它只是发送请求,作业在后台运行.您必须在spark UI上检查作业完成的状态.

Submitting using 6066 does NOT indicate that your job is successfully completed. It just sends request, the job is running in background. You have to check on spark UI for the status of job completion.

如果作业已完成并且作业生成了输出文件,则可以使用以下方法检查文件:

If the job is completed and your job generated output files, you can check your file using:

hadoop dfs -ls <path>/rdd1

这篇关于找不到Spark应用程序输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆