如何拉平内RDD列表? [英] How to flatten list inside RDD?

查看:222
本文介绍了如何拉平内RDD列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能拉平内RDD列表?例如转换:

  VAL XXX:org.apache.spark.rdd.RDD [列表[美孚]

  VAL YYY:org.apache.spark.rdd.RDD [美孚]

如何做到这一点?


解决方案

  VAL RDD = sc.parallelize(阵列(名单(1,2,3),列表(4,5, 6),列表(7,8,9),表(10,11,12)))
// org.apache.spark.rdd.RDD [列表[INT]] = ParallelCollectionRDD ...VAL RDDI = rdd.flatMap(名单=>清单)
// RDDI:org.apache.spark.rdd.RDD [INT] = FlatMappedRDD ...//这是相同rdd.flatMap(身份)
//身份是preDEF对象中定义的方法。
// DEF身份[A](X:A):一个rddi.collect()
// RES2:数组[INT] =阵列(1,2,3,4,5,6,7,8,9,10,11,12)

Is it possible to flatten list inside RDD? For example convert:

 val xxx: org.apache.spark.rdd.RDD[List[Foo]]

to:

 val yyy: org.apache.spark.rdd.RDD[Foo]

How to do this?

解决方案

val rdd = sc.parallelize(Array(List(1,2,3), List(4,5,6), List(7,8,9), List(10, 11, 12)))
// org.apache.spark.rdd.RDD[List[Int]] = ParallelCollectionRDD ...

val rddi = rdd.flatMap(list => list)
// rddi: org.apache.spark.rdd.RDD[Int] = FlatMappedRDD ...

// which is same as rdd.flatMap(identity)
// identity is a method defined in Predef object.
//    def identity[A](x: A): A

rddi.collect()
// res2: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

这篇关于如何拉平内RDD列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆