在groupBy之后获得前N名，并将其视为RDD [英] take top N after groupBy and treat them as RDD

查看：108 发布时间：2020/9/4 1:23:10 scala apache-spark rdd

本文介绍了在groupBy之后获得前N名，并将其视为RDD的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想获取RDD的groupByKey之后的前N个项目，并将topNPerGroup(在下面)的类型转换为RDD[(String, Int)]，其中List[Int]值为flatten

I'd like to get top N items after groupByKey of RDD and convert the type of topNPerGroup(in the below) to RDD[(String, Int)] where List[Int] values are flatten

data是

val data = sc.parallelize(Seq("foo"->3, "foo"->1, "foo"->2,
                              "bar"->6, "bar"->5, "bar"->4))

每组中排名靠前的N个项目的计算方式为:

The top N items per group are computed as:

val topNPerGroup: RDD[(String, List[Int]) = data.groupByKey.map { 
   case (key, numbers) => 
       key -> numbers.toList.sortBy(-_).take(2)
}

结果是

(bar,List(6, 5))
(foo,List(3, 2))

由

topNPerGroup.collect.foreach(println)

如果实现，topNPerGroup.collect.foreach(println)将生成(预期结果！)

(bar, 6)
(bar, 5)
(foo, 3)
(foo, 2)

推荐答案

您的问题有点令人困惑，但是我认为这符合您的要求:

Your question is a little confusing, but I think this does what you're looking for:

val flattenedTopNPerGroup = 
    topNPerGroup.flatMap({case (key, numbers) => numbers.map(key -> _)})

并在REPL中打印出您想要的内容:

and in the repl it prints out what you want:

flattenedTopNPerGroup.collect.foreach(println)
(foo,3)
(foo,2)
(bar,6)
(bar,5)

这篇关于在groupBy之后获得前N名，并将其视为RDD的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在groupBy之后获得前N名，并将其视为RDD [英] take top N after groupBy and treat them as RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在groupBy之后获得前N名，并将其视为RDD [英] take top N after groupBy and treat them as RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭