在数据帧上执行操作时会创建DAG吗? [英] Is DAG created when we perform operations over dataframes?

查看:95
本文介绍了在数据帧上执行操作时会创建DAG吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到,每当对RDD执行任何操作时都会生成DAG,但是当我们对数据帧执行操作时会发生什么?

I have seen DAG getting generated whenever we perform any operation on RDD but what happens when we perform operations on our dataframe?

在数据帧上执行多个操作时,是否像RDD一样懒惰地对它们进行求值?

When executing multiple operations on dataframe, Are those lazily evaluated just like RDD?

催化剂优化器何时出现?

When the catalyst optimizer comes into the picture?

我对这些感到困惑.如果任何人都可以对这些话题有所了解,那将真的有很大的帮助.

I am sort of confused between these. If anyone can throw some light on these topics, it would be really of great help.

谢谢

推荐答案

Dataset上的每个操作,尽管是连续处理模式,都将转换为对内部的一系列操作.因此,DAG的概念绝对适用.

Every operation on a Dataset, continuous processing mode notwithstanding, is translated into a sequence of operations on internal RDDs. Therefore concept of DAG is by all means applicable.

通过扩展,执行通常是懒惰的,尽管与往常一样,存在例外,并且与纯RDD API相比,在Dataset API中更常见.

By extension, execution is primarily lazy, though as always exceptions exists, and are more common in Dataset API, compared to pure RDD API.

最后,催化剂负责将Dataset API调用转换为逻辑的,优化的逻辑和物理执行计划,以及最终生成将由任务执行的代码.

Finally Catalyst is responsible for transforming Dataset API calls, into logical, optimized logical and physical execution plan, and finally generating code which will executed by the tasks.

这篇关于在数据帧上执行操作时会创建DAG吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆