什么是Spark UI事件时间表中的调度程序延迟 [英] What is scheduler delay in spark UI's event timeline

查看:190
本文介绍了什么是Spark UI事件时间表中的调度程序延迟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用YARN环境运行spark程序, 带有选项--master yarn-cluster.

I am using YARN environment to run spark programs, with option --master yarn-cluster.

当我打开spark应用程序的应用程序母版时,我在一个阶段中看到了很多Scheduler Delay.其中一些甚至超过10分钟.我想知道他们是什么,为什么要花这么长时间?

When I open a spark application's application master, I saw a lot of Scheduler Delay in a stage. Some of them are even more than 10 minutes. I wonder what are they and why it takes so long?

更新: 通常,在执行者真正开始执行任务之前,诸如AggregatByKey之类的操作会花费更多的时间(即调度程序延迟).为什么呢?

Update: Usually operations like aggregateByKey take much more time (i.e. scheduler delay) before executors really start doing tasks. Why is it?

推荐答案

打开显示其他指标"(单击向右指向的三角形,使其指向下方),然后将鼠标悬停在计划程序延迟"复选框上.它显示了此工具提示:

Open the "Show Additional Metrics" (click the right-pointing triangle so it points down) and mouse over the check box for "Scheduler Delay". It shows this tooltip:

计划程序延迟包括将任务从计划程序发送到执行程序的时间,以及将任务结果从执行程序发送到执行程序的时间. 调度程序.如果调度程序延迟很大,请考虑减少 任务大小或减小任务结果的大小.

Scheduler delay includes time to ship the task from the scheduler to the executor, and time to send the task result from the executor to the scheduler. If scheduler delay is large, consider decreasing the size of tasks or decreasing the size of task results.

调度程序是主机的一部分,该调度程序将作业分为任务阶段,并与基础群集基础结构一起在群集中分布它们.

The scheduler is part of the master that divides the job into stages of tasks and works with the underlying cluster infrastructure to distribute them around the cluster.

这篇关于什么是Spark UI事件时间表中的调度程序延迟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆