Spark中DataFrame、Dataset、RDD的区别 [英] Difference between DataFrame, Dataset, and RDD in Spark

查看：40 发布时间：2021/11/14 21:14:06 dataframe apache-spark apache-spark-sql rdd apache-spark-dataset

本文介绍了Spark中DataFrame、Dataset、RDD的区别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我只是想知道 RDD 和 DataFrame 之间有什么区别 (Spark 2.0.0 DataFrame 只是 Dataset 的类型别名[行]) 在 Apache Spark 中?

I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset[Row]) in Apache Spark?

你能把一个转换成另一个吗?

Can you convert one to the other?

推荐答案

A DataFrame is defined with a google search for "DataFrame definition":

A DataFrame is defined well with a google search for "DataFrame definition":

一个数据框是一个表格，或者二维数组状的结构，在其中每一列包含对一个变量的测量，每一行包含一个案例.

A data frame is a table, or two-dimensional array-like structure, in which each column contains measurements on one variable, and each row contains one case.

因此，DataFrame 由于其表格格式而具有额外的元数据，这允许 Spark 对最终查询运行某些优化.

So, a DataFrame has additional metadata due to its tabular format, which allows Spark to run certain optimizations on the finalized query.

RDD，另一方面，只是一个R弹性D分布式Dataset，它是更多的是无法优化的数据黑匣子，因为可以对其执行的操作不受约束.

An RDD, on the other hand, is merely a Resilient Distributed Dataset that is more of a blackbox of data that cannot be optimized as the operations that can be performed against it, are not as constrained.

但是，您可以通过其 rdd 方法从 DataFrame 转到 RDD，也可以从 RDD 转到 RDDcode>DataFrame(如果RDD是表格格式)通过toDF方法

However, you can go from a DataFrame to an RDD via its rdd method, and you can go from an RDD to a DataFrame (if the RDD is in a tabular format) via the toDF method

一般情况，由于内置查询优化，建议尽可能使用 DataFrame.

In general it is recommended to use a DataFrame where possible due to the built in query optimization.

这篇关于Spark中DataFrame、Dataset、RDD的区别的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark中DataFrame、Dataset、RDD的区别 [英] Difference between DataFrame, Dataset, and RDD in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark中DataFrame、Dataset、RDD的区别 [英] Difference between DataFrame, Dataset, and RDD in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭