将Java-Pair-Rdd转换为Rdd [英] Transform Java-Pair-Rdd to Rdd

查看:440
本文介绍了将Java-Pair-Rdd转换为Rdd的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将我的Java-pair-rdd转换为csv:

I need to transform my Java-pair-rdd to a csv :

所以我正在考虑将其转换为rdd,以解决我的问题.

so i m thinking to transform it to rdd, to solve my problem.

我想要的是对我的rdd进行转换 来自:

what i want is to have my rdd transformed from :

Key   Value
Jack  [a,b,c]

至:

Key  value
Jack  a
Jack  b
Jack  c

我发现在问题中有可能并在本期中( PySpark:将一对RDD转换回常规RDD ) 所以我问如何在Java中做到这一点?

i see that it is possible in that issue and in this issue(PySpark: Convert a pair RDD back to a regular RDD) so i am asking how to do that in java?

我的 JavaPairRdd 的类型为:

JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>>

这是包含以下内容的行的形式:

and this is the form of row that contain :

((dr5rvey,dr5ruku),[(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)])

在这里是:(dr5rvey,dr5ruku)[(2,01/09/2013 00:09,01/09/2013 00:27,N,1,-73.9287262,40.75831223,-73.98726654,40.76442719,2,3.96,16,0.5,0.5,4.25,0,,21.25,1,)]

我的原始 JavaRdd 类型为:

JavaRDD<String>

推荐答案

理解密钥应该保留,您可以使用flatMapValues函数:

Understanding that the keys should be kept, you may use flatMapValues function :

在不更改键的情况下,通过flatMap函数传递键值对RDD中的每个值; ...

Pass each value in the key-value pair RDD through a flatMap function without changing the keys; ...

JavaPairRDD<Tuple2<String,String>, Iterable<Tuple1<String>>> input = ...;
JavaPairRDD<Tuple2<String, String>, Tuple1<String>> output1 = input.flatMapValues(iter -> iter);
JavaPairRDD<Tuple2<String, String>, String> output2 = output1.mapValues(t1 -> t1._1());

这篇关于将Java-Pair-Rdd转换为Rdd的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆