Apache Spark 数据集 API:head(n:Int) 与 take(n:Int) [英] Apache Spark DataSet API : head(n:Int) vs take(n:Int)

查看：18 发布时间：2021/11/14 22:30:52 apache-spark apache-spark-sql spark-dataframe

本文介绍了Apache Spark 数据集 API:head(n:Int) 与 take(n:Int)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Apache Spark Dataset API 有两种方法，即 head(n:Int) 和 take(n:Int).

Dataset.Scala 源代码包含

def take(n: Int): Array[T] = head(n)

在这两个函数之间找不到执行代码的任何差异.为什么 API 有两种不同的方法来产生相同的结果?

解决方案

我已经尝试过 &发现 head(n) 和 take(n) 给出完全相同的副本输出.两者都只以 ROW 对象的形式产生输出.

<块引用><块引用><块引用>

DF.head(2)

[Row(Transaction_date=u'1/2/2009 6:17', Product=u'Product1', Price=u'1200', Payment_Type=u'Mastercard', Name=u'carolina', City=u'Basildon', State=u'England', Country=u'United Kingdom'), Row(Transaction_date=u'1/2/2009 4:53', Product=u'Product2', Price=u'1200', Payment_Type=u'Visa', Name=u'Betina', City=u'Parkville', State=u'MO', Country=u'United States')]

<块引用><块引用><块引用>

DF.take(2)

Apache Spark Dataset API has two methods i.e, head(n:Int) and take(n:Int).

Dataset.Scala source contains

def take(n: Int): Array[T] = head(n)

Couldn't find any difference in execution code between these two functions. why do API has two different methods to yield the same result?

解决方案

I have experimented & found that head(n) and take(n) gives exactly same replica output. Both produces output in the form of ROW object only.

DF.head(2)

DF.take(2)

这篇关于Apache Spark 数据集 API:head(n:Int) 与 take(n:Int)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Apache Spark 数据集 API:head(n:Int) 与 take(n:Int) [英] Apache Spark DataSet API : head(n:Int) vs take(n:Int)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Apache Spark 数据集 API:head(n:Int) 与 take(n:Int) [英] Apache Spark DataSet API : head(n:Int) vs take(n:Int)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭