在groupBy之后获得前N名,并将其视为RDD [英] take top N after groupBy and treat them as RDD

查看:108
本文介绍了在groupBy之后获得前N名,并将其视为RDD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取RDD的groupByKey之后的前N个项目,并将topNPerGroup(在下面)的类型转换为RDD[(String, Int)],其中List[Int]值为flatten

I'd like to get top N items after groupByKey of RDD and convert the type of topNPerGroup(in the below) to RDD[(String, Int)] where List[Int] values are flatten

data

val data = sc.parallelize(Seq("foo"->3, "foo"->1, "foo"->2,
                              "bar"->6, "bar"->5, "bar"->4))

每组中排名靠前的N个项目的计算方式为:

The top N items per group are computed as:

val topNPerGroup: RDD[(String, List[Int]) = data.groupByKey.map { 
   case (key, numbers) => 
       key -> numbers.toList.sortBy(-_).take(2)
}

结果是

(bar,List(6, 5))
(foo,List(3, 2))

topNPerGroup.collect.foreach(println)

如果实现,topNPerGroup.collect.foreach(println)将生成(预期结果!)

(bar, 6)
(bar, 5)
(foo, 3)
(foo, 2)

推荐答案

您的问题有点令人困惑,但是我认为这符合您的要求:

Your question is a little confusing, but I think this does what you're looking for:

val flattenedTopNPerGroup = 
    topNPerGroup.flatMap({case (key, numbers) => numbers.map(key -> _)})

并在REPL中打印出您想要的内容:

and in the repl it prints out what you want:

flattenedTopNPerGroup.collect.foreach(println)
(foo,3)
(foo,2)
(bar,6)
(bar,5)

这篇关于在groupBy之后获得前N名,并将其视为RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆