Apache Spark 数据集 API:head(n:Int) 与 take(n:Int) [英] Apache Spark DataSet API : head(n:Int) vs take(n:Int)
问题描述
Apache Spark Dataset API 有两种方法,即 head(n:Int)
和 take(n:Int)
.
Dataset.Scala 源代码包含
def take(n: Int): Array[T] = head(n)
在这两个函数之间找不到执行代码的任何差异.为什么 API 有两种不同的方法来产生相同的结果?
我已经尝试过 &发现 head(n) 和 take(n) 给出完全相同的副本输出.两者都只以 ROW 对象的形式产生输出.
<块引用><块引用><块引用>DF.head(2)
[Row(Transaction_date=u'1/2/2009 6:17', Product=u'Product1', Price=u'1200', Payment_Type=u'Mastercard', Name=u'carolina', City=u'Basildon', State=u'England', Country=u'United Kingdom'), Row(Transaction_date=u'1/2/2009 4:53', Product=u'Product2', Price=u'1200', Payment_Type=u'Visa', Name=u'Betina', City=u'Parkville', State=u'MO', Country=u'United States')]
<块引用><块引用><块引用>DF.take(2)
[Row(Transaction_date=u'1/2/2009 6:17', Product=u'Product1', Price=u'1200', Payment_Type=u'Mastercard', Name=u'carolina', City=u'Basildon', State=u'England', Country=u'United Kingdom'), Row(Transaction_date=u'1/2/2009 4:53', Product=u'Product2', Price=u'1200', Payment_Type=u'Visa', Name=u'Betina', City=u'Parkville', State=u'MO', Country=u'United States')]
Apache Spark Dataset API has two methods i.e, head(n:Int)
and take(n:Int)
.
Dataset.Scala source contains
def take(n: Int): Array[T] = head(n)
Couldn't find any difference in execution code between these two functions. why do API has two different methods to yield the same result?
I have experimented & found that head(n) and take(n) gives exactly same replica output. Both produces output in the form of ROW object only.
DF.head(2)
[Row(Transaction_date=u'1/2/2009 6:17', Product=u'Product1', Price=u'1200', Payment_Type=u'Mastercard', Name=u'carolina', City=u'Basildon', State=u'England', Country=u'United Kingdom'), Row(Transaction_date=u'1/2/2009 4:53', Product=u'Product2', Price=u'1200', Payment_Type=u'Visa', Name=u'Betina', City=u'Parkville', State=u'MO', Country=u'United States')]
DF.take(2)
[Row(Transaction_date=u'1/2/2009 6:17', Product=u'Product1', Price=u'1200', Payment_Type=u'Mastercard', Name=u'carolina', City=u'Basildon', State=u'England', Country=u'United Kingdom'), Row(Transaction_date=u'1/2/2009 4:53', Product=u'Product2', Price=u'1200', Payment_Type=u'Visa', Name=u'Betina', City=u'Parkville', State=u'MO', Country=u'United States')]
这篇关于Apache Spark 数据集 API:head(n:Int) 与 take(n:Int)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!