将Java-Pair-Rdd转换为Rdd [英] Transform Java-Pair-Rdd to Rdd
问题描述
我需要将我的Java-pair-rdd转换为csv:
I need to transform my Java-pair-rdd to a csv :
所以我正在考虑将其转换为rdd,以解决我的问题.
so i m thinking to transform it to rdd, to solve my problem.
我想要的是对我的rdd进行转换 来自:
what i want is to have my rdd transformed from :
Key Value
Jack [a,b,c]
至:
Key value
Jack a
Jack b
Jack c
我发现在问题中有可能并在本期中( PySpark:将一对RDD转换回常规RDD ) 所以我问如何在Java中做到这一点?
i see that it is possible in that issue and in this issue(PySpark: Convert a pair RDD back to a regular RDD) so i am asking how to do that in java?
我的 JavaPairRdd 的类型为:
JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>>
这是包含以下内容的行的形式:
and this is the form of row that contain :
((dr5rvey,dr5ruku),[(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)])
键在这里是:(dr5rvey,dr5ruku)
,值是[(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)]
我的原始 JavaRdd 类型为:
JavaRDD<String>
推荐答案
理解密钥应该保留,您可以使用flatMapValues函数:
Understanding that the keys should be kept, you may use flatMapValues function :
在不更改键的情况下,通过flatMap函数传递键值对RDD中的每个值; ...
Pass each value in the key-value pair RDD through a flatMap function without changing the keys; ...
JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>> input = ...;
JavaPairRDD<Tuple2<String, String>, Tuple1<String>> output1 = input.flatMapValues(iter -> iter);
JavaPairRDD<Tuple2<String, String>, String> output2 = output1.mapValues(t1 -> t1._1());
这篇关于将Java-Pair-Rdd转换为Rdd的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!