如何转换JavaRDD<< List< String>>到JavaPairRDD< String,String> [英] How to convert JavaRDD<<List<String>> to JavaPairRDD<String, String>
问题描述
我打印JavaRDD时,我的数据如下所示:
[[String1,String2,String3],[String4],[String5,String6],[String7,String8,String9]]
每个字符串都是管道分隔的字符串。我可以将每个字符串拆分为一个键和值。
如何将此RDD转换为JavaPairRDD?
List_0:[sub10〜sub11〜sub12,sub20〜sub21〜sub22,sub30〜sub31〜sub32]
List_1:[sub40 〜sub41〜sub42]
其中〜
是分隔符。
而且你想平整列表并用 |
分组第一和第三个子字符串,作为每个输入字符串的关键字,然后将对存储在 JavaPairRDD< String,String>
中:
键:sub10 | sub12value:sub10〜sub11〜sub12
您可以通过使用 flatMap
然后 mapToPair
来实现此目的:
rdd.flatMap(new FlatMapFunction< List< String>,String>(){
public Iterable<串GT;调用(List< String> li)抛出异常{
return li;
}
))。mapToPair(new PairFunction< String,String,String>(){
public Tuple2< String,String> call(String s)throws Exception {
String [] ss = s.split(〜);
返回新的Tuple2< String,String>(ss [0] +|+ ss [2],s);
}
});
I have a JavaRDD when I print it my data looks like this [[String1,String2,String3],[String4],[String5,String6],[String7,String8,String9]]
Each String is in turn a pipe separated strings. I can split each string to form a key and value.
How can I convert this RDD to a JavaPairRDD?
Assuming you have such data in JavaRDD<List<String>>
:
List_0: ["sub10~sub11~sub12","sub20~sub21~sub22","sub30~sub31~sub32"]
List_1: ["sub40~sub41~sub42"]
Where ~
is the separator.
And you want to flat the lists and group the first and the third sub string with |
as the key for each input string, then store pairs in JavaPairRDD<String,String>
:
key: "sub10|sub12" value: "sub10~sub11~sub12"
You could achieve this by using flatMap
and then mapToPair
:
rdd.flatMap(new FlatMapFunction<List<String>,String>() {
public Iterable<String> call(List<String> li) throws Exception {
return li;
}
}).mapToPair(new PairFunction<String,String,String>() {
public Tuple2<String, String> call(String s) throws Exception {
String[] ss = s.split("~");
return new Tuple2<String,String>(ss[0] + "|" + ss[2], s);
}
});
这篇关于如何转换JavaRDD<< List< String>>到JavaPairRDD< String,String>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!