“跳过阶段"是什么意思在 Apache Spark Web UI 中是什么意思? [英] What does "Stage Skipped" mean in Apache Spark web UI?

查看:30
本文介绍了“跳过阶段"是什么意思在 Apache Spark Web UI 中是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自我的 Spark UI.跳过是什么意思?

From my Spark UI. What does it mean by skipped?

推荐答案

通常这意味着数据已经从缓存中获取并且不需要重新执行给定的阶段.它与您的 DAG 一致,表明下一阶段需要改组 (reduceByKey).每当涉及改组时,Spark 自动缓存生成的数据:

Typically it means that data has been fetched from cache and there was no need to re-execute given stage. It is consistent with your DAG which shows that the next stage requires shuffling (reduceByKey). Whenever there is shuffling involved Spark automatically caches generated data:

Shuffle 还会在磁盘上生成大量中间文件.从 Spark 1.3 开始,这些文件会一直保留,直到相应的 RDD 不再使用并被垃圾回收.这样做是为了在重新计算谱系时不需要重新创建 shuffle 文件.

Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files are preserved until the corresponding RDDs are no longer used and are garbage collected. This is done so the shuffle files don’t need to be re-created if the lineage is re-computed.

这篇关于“跳过阶段"是什么意思在 Apache Spark Web UI 中是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆