在groupBy之后获得前N名,并将其视为RDD [英] take top N after groupBy and treat them as RDD
本文介绍了在groupBy之后获得前N名,并将其视为RDD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想获取RDD
的groupByKey之后的前N
个项目,并将topNPerGroup
(在下面)的类型转换为RDD[(String, Int)]
,其中List[Int]
值为flatten
I'd like to get top N
items after groupByKey of RDD
and convert the type of topNPerGroup
(in the below) to RDD[(String, Int)]
where List[Int]
values are flatten
data
是
val data = sc.parallelize(Seq("foo"->3, "foo"->1, "foo"->2,
"bar"->6, "bar"->5, "bar"->4))
每组中排名靠前的N
个项目的计算方式为:
The top N
items per group are computed as:
val topNPerGroup: RDD[(String, List[Int]) = data.groupByKey.map {
case (key, numbers) =>
key -> numbers.toList.sortBy(-_).take(2)
}
结果是
(bar,List(6, 5))
(foo,List(3, 2))
由
topNPerGroup.collect.foreach(println)
如果实现,topNPerGroup.collect.foreach(println)
将生成(预期结果!)
(bar, 6)
(bar, 5)
(foo, 3)
(foo, 2)
推荐答案
您的问题有点令人困惑,但是我认为这符合您的要求:
Your question is a little confusing, but I think this does what you're looking for:
val flattenedTopNPerGroup =
topNPerGroup.flatMap({case (key, numbers) => numbers.map(key -> _)})
并在REPL中打印出您想要的内容:
and in the repl it prints out what you want:
flattenedTopNPerGroup.collect.foreach(println)
(foo,3)
(foo,2)
(bar,6)
(bar,5)
这篇关于在groupBy之后获得前N名,并将其视为RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文