Scala Spark对groupBy的反向分组 [英] Scala Spark reverse grouping of groupBy

查看：55 发布时间：2021/5/13 19:49:37 scala apache-spark group-by

本文介绍了Scala Spark对groupBy的反向分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试反转(展平)在Scala中的RDD上创建的分组，例如:

Im trying to reverse (flatten out) the grouping created on a RDD in Scala, like this: https://backtobazics.com/big-data/spark/apache-spark-groupby-example/

基本上我有一个键-值，其中值是一个列表.我想把它弄平.我不知道如何去做，我以为它必须以某种方式位于平面图中，但是我不知道语法.有人可以指出我正确的方向吗?

Basically what i have is a key - value where the value is a list. I want to flatten that out. I cant figure out how to go about it, im thinking it must lie in flatmap somehow, but i cant figure out the syntax. Can anybody point me in the right direction please?

推荐答案

您应该提供一些代码来回答您的问题，但这是如何利用来使 groupBy 扁平化flatMap (我使用的代码段类似于使用Scala的示例通过Spark组").现在，我假设您正在使用字符串的RDD.

You should provide some code in order to answer your question, but here is how you can flatten a groupBy by leveraging flatMap (I am using a code snippet similar to the "Spark groupBy Example Using Scala"). For now, I assume you are working with an RDD of strings.

val v = Array("foo", "bar", "foobarz")
val rdd: org.apache.spark.rdd.RDD[String] = sc.parallelize(v)
val kvRDD: org.apache.spark.rdd.RDD[(String, Iterable[String])] = rdd.groupBy(x => x) // your group by function goes here
// if you explicitly want to keep the key and generate an RDD of tuples
val pairRDD: org.apache.spark.rdd.RDD[(String, String)] = kvRDD.flatMap({ case (k: String, v: Iterable[String]) => v.map(i => (k, i))})
// or if you just want to undo the grouping without preserving the key
val origRDD: org.apache.spark.rdd.RDD[String] = kvRDD.flatMap({ case (_: String, v: Iterable[String]) => v})

这篇关于Scala Spark对groupBy的反向分组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scala Spark对groupBy的反向分组 [英] Scala Spark reverse grouping of groupBy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Scala Spark对groupBy的反向分组 [英] Scala Spark reverse grouping of groupBy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭