我怎么知道我的火花工作是否在进行中? [英] How can I tell if my spark job is progressing?
问题描述
我有一个在 YARN
上运行的spark作业,它似乎挂起了,没有进行任何计算.
I have a spark job running on YARN
and it appears to just hang and not be doing any computation.
这是当我执行 yarn application -status< APPLICATIOM ID>
时yarn所说的:
Here's what yarn says when I do yarn application -status <APPLICATIOM ID>
:
Application Report :
Application-Id : applicationID
Application-Name : test app
Application-Type : SPARK
User : ec2-user
Queue : default
Start-Time : 1491005660004
Finish-Time : 0
Progress : 10%
State : RUNNING
Final-State : UNDEFINED
Tracking-URL : http://<ip>:4040
RPC Port : 0
AM Host : <host ip>
Aggregate Resource Allocation : 36343926 MB-seconds, 9818 vcore-seconds
Log Aggregation Status : NOT_START
Diagnostics :
而且,当我检查 yarn应用程序-list
时,它说它是 RUNNING
.但是我不确定我是否相信这一点.当我转到spark webUI时,在运行它的整个几个小时中,我仅看到一个阶段:
And, when I check the yarn application -list
it says that it is RUNNING
. But I'm not sure I trust that. When I go to the spark webUI, I see only one stage the entire few hours I've been running it:
此外,当我单击阶段"选项卡时,我什么也没看到:
Also, when I click on the "Stages" tab, I see nothing running:
如何确保我的应用程序实际上正在运行并且 YARN
对我没有说谎?
How do ensure that my application is actually running and that YARN
is not lying to me?
实际上,我宁愿为此引发一个错误,而不是让我等待以查看该作业是否运行正常.我该怎么办?
I would actually prefer for this to throw an error rather than keep me waiting to see if the job is actaully runing. How do I do that?
推荐答案
在spark应用程序用户界面上
On the spark application UI
如果您单击链接:"Nativexxxx的镶木地板",则会为您显示运行阶段的详细信息.
If you click on the link : "parquet at Nativexxxx" it would show you Details for the running stage.
在该屏幕上将出现一列输入大小/记录".如果您的工作进展顺利,该列中显示的数字将会更改.
On that screen there would be a column "Input Size/Records". If your job is progressing the number shown in that column would change.
它基本上描述了您的执行者读取的记录数.
It basically depicts number of records read by your executor.
这篇关于我怎么知道我的火花工作是否在进行中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!