为什么sortBy转换会触发Spark作业? [英] Why does sortBy transformation trigger a Spark job?

查看:178
本文介绍了为什么sortBy转换会触发Spark作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据Spark文档,只有RDD动作可以触发Spark作业,并且在调用动作时会延迟评估转换.

As per Spark documentation only RDD actions can trigger a Spark job and the transformations are lazily evaluated when an action is called on it.

我看到sortBy转换函数被立即应用,并在SparkUI中显示为作业触发器.为什么?

I see the sortBy transformation function is applied immediately and it is shown as a job trigger in the SparkUI. Why?

推荐答案

sortBy使用sortByKey实现,而sortByKey取决于RangePartitioner(JVM)或分区函数(Python).调用sortBy/sortByKey时,分区器(分区函数)将被初始化,并对输入的RDD进行采样以计算分区边界.您看到的作业与此过程相对应.

sortBy is implemented using sortByKey which depends on a RangePartitioner (JVM) or partitioning function (Python). When you call sortBy / sortByKey partitioner (partitioning function) is initialized eagerly and samples input RDD to compute partition boundaries. Job you see corresponds to this process.

仅当您对新创建的RDD或其后代执行操作时,才会执行实际排序.

Actual sorting is performed only if you execute an action on the newly created RDD or its descendants.

这篇关于为什么sortBy转换会触发Spark作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆