在Spark作业中的多个阶段中重复相同的任务集 [英] Same set of Tasks are repeated in multiple stages in a Spark Job

查看:254
本文介绍了在Spark作业中的多个阶段中重复相同的任务集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一组任务由过滤器&地图出现在多个阶段的DAG可视化中.这是否意味着在所有阶段都重新计算了相同的转换?如果可以,该如何解决?

A group of tasks consists of filters & maps appears in DAG visualization of multiple stages. Does this mean the same transformations are recomputed in all the stages? If so how to resolve this?

推荐答案

对于每个操作,所有转换都会重新计算.这是由于在执行操作之前不计算转换.

For every action performed on a dataframe, all transformations will be recomputed. This is due to the transformations not being computed until an action is performed.

如果只有一个动作,那么您无能为力,但是,如果一个动作又一个动作,那么在最后一次转换之后可以使用cache().通过使用此方法,Spark会在第一次计算后将数据帧保存到RAM,从而使后续操作快得多.

If you only have a single action then there is nothing you can do, however, in the case of multiple actions after each other, then cache() can be used after the last transformation. By using this method Spark will save the dataframe to RAM after the first computation, making subsequent actions much faster.

这篇关于在Spark作业中的多个阶段中重复相同的任务集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆