Spark Transformation-为什么它比较懒惰,优点是什么? [英] Spark Transformation - Why its lazy and what is the advantage?

查看:340
本文介绍了Spark Transformation-为什么它比较懒惰,优点是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark Transformations的计算是延迟的-当我们调用该操作时,它将根据沿袭图执行所有转换.

Spark Transformations are lazily evaluated - when we call the action it executes all the transformations based on lineage graph.

评估转换Lazilyy"有什么好处?

What is the advantage of having the Transformations Lazilyy evaluated?

与热情评估的结果相比,它会改善performance并减少memory consumption量吗?

Will it improve the performance and less amount of memory consumption compare to eagerly evaluated?

懒惰地评估转换是否有任何缺点?

Is there any disadvantage of having the Transformation lazily evaluated?

推荐答案

对于转换,Spark将其添加到计算的DAG中,并且只有在驱动程序请求一些数据时,此DAG才会真正执行.

For transformations, Spark adds them to a DAG of computation and only when driver requests some data, does this DAG actually gets executed.

这样做的一个优势是,Spark有机会全面了解DAG之后,可以做出许多优化决策.如果它一得到执行就执行所有操作,那将是不可能的.

One advantage of this is that Spark can make many optimization decisions after it had a chance to look at the DAG in entirety. This would not be possible if it executed everything as soon as it got it.

例如-如果您急切地执行每个转换,那是什么意思?好吧,这意味着您将必须在内存中实现许多中间数据集.这显然效率不高-其中之一将增加您的GC成本. (因为您实际上对这样的中间结果不感兴趣.这些只是编写程序时对您来说方便的抽象方法.)因此,您要做的是-告诉Spark您最终感兴趣的答案是什么,并且它找出到达那里的最佳方法.

For example -- if you executed every transformation eagerly, what does that mean? Well, it means you will have to materialize that many intermediate datasets in memory. This is evidently not efficient -- for one, it will increase your GC costs. (Because you're really not interested in those intermediate results as such. Those are just convnient abstractions for you while writing the program.) So, what you do instead is -- you tell Spark what is the eventual answer you're interested and it figures out best way to get there.

这篇关于Spark Transformation-为什么它比较懒惰,优点是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆