如何在Spark Scala中为每个类别选择N个最大值 [英] How to select the N highest values for each category in spark scala
本文介绍了如何在Spark Scala中为每个类别选择N个最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
说我有这个数据集:
val main_df = Seq(("yankees-mets",8,20),("yankees-redsox",4,14),("yankees-mets",6,17),
("yankees-redsox",2,10),("yankees-mets",5,17),("yankees-redsox",5,10)).toDF("teams","homeruns","hits")
看起来像这样:
我想着眼于各队的专栏,对于其他所有专栏,返回该专栏的2(或N)个最高值.因此,对于yankees-mets和本垒打,它将返回此值,
I want to pivot on the teams' columns, and for all the other columns return the 2 (or N) highest values for that column. So for yankees-mets and homeruns, it would return this,
由于本垒打的2个最高本垒打总数分别是8和6.
Since the 2 highest homerun totals for them were 8 and 6.
一般情况下我该怎么做?
How would I do this in the general case?
谢谢
推荐答案
查看全文