Spark Transformation - 为什么它是懒惰的,有什么好处? [英] Spark Transformation - Why is it lazy and what is the advantage?

查看:39
本文介绍了Spark Transformation - 为什么它是懒惰的,有什么好处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark Transformations 是惰性求值的 - 当我们调用 action 时,它会执行基于谱系图的所有转换.

Spark Transformations are lazily evaluated - when we call the action it executes all the transformations based on lineage graph.

延迟评估转换有什么好处?

What is the advantage of having the Transformations Lazily evaluated?

与急切评估相比,它是否会提高性能和更少的内存消耗?

Will it improve the performance and less amount of memory consumption compare to eagerly evaluated?

延迟评估 Transformation 有什么缺点吗?

Is there any disadvantage of having the Transformation lazily evaluated?

推荐答案

对于转换,Spark 将它们添加到计算的 DAG 中,并且只有当驱动程序请求某些数据时,这个 DAG 才会真正被执行.

For transformations, Spark adds them to a DAG of computation and only when driver requests some data, does this DAG actually gets executed.

这样做的一个好处是,Spark 在有机会完整地查看 DAG 后可以做出许多优化决策.如果它一得到它就执行一切,这是不可能的.

One advantage of this is that Spark can make many optimization decisions after it had a chance to look at the DAG in entirety. This would not be possible if it executed everything as soon as it got it.

例如——如果您急切地执行每个转换,这意味着什么?嗯,这意味着您必须在内存中实现许多中间数据集.这显然效率不高——一方面,它会增加您的 GC 成本.(因为你真的对那些中间结果不感兴趣.这些只是你在编写程序时方便的抽象.)所以,你要做的是 - 你告诉 Spark 你感兴趣的最终答案是什么,它找出了到达那里的最佳方式.

For example -- if you executed every transformation eagerly, what does that mean? Well, it means you will have to materialize that many intermediate datasets in memory. This is evidently not efficient -- for one, it will increase your GC costs. (Because you're really not interested in those intermediate results as such. Those are just convnient abstractions for you while writing the program.) So, what you do instead is -- you tell Spark what is the eventual answer you're interested and it figures out best way to get there.

这篇关于Spark Transformation - 为什么它是懒惰的,有什么好处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆