将RDD对以特定格式保存在输出文件中 [英] Saving The RDD pair in particular format in the output file

查看:451
本文介绍了将RDD对以特定格式保存在输出文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个JavaPairRDD,可以说类型

I have a JavaPairRDD lets say data of type

<Integer,List<Integer>>

当我执行data.saveAsTextFile("output")时 输出将包含以下格式的数据:

when i do data.saveAsTextFile("output") The output will contain the data in the following format:

(1,[1,2,3,4])

(1,[1,2,3,4])

等...

我想要在输出文件中这样的内容:

I want something like this in the output file :

1 1,2,3,4

1 1,2,3,4

i.e. 1\t1,2,3,4

任何帮助将不胜感激

推荐答案

您需要了解这里发生的情况.您有一个RDD[T,U],其中T和U是某些obj类型,将其读取为T和U的元组的RDD.在此RDD上,当调用saveAsTextFile()时,它将RDD的每个元素实质上转换为字符串,因此是文本文件作为输出生成.

You need to understand what's happening here. You have an RDD[T,U] where T and U are some obj types, read it as RDD of Tuple of T and U. On this RDD when you call saveAsTextFile(), it essentially converts each element of RDD to string, hence the text file is generated as output.

现在,类型T的对象如何转换为字符串?通过调用toString().这就是为什么让[]代表列表,而()代表整个元组的原因.

Now, how is an object of some type T converted to a string? By calling the toString() on it. This is the reason why you have [] representing the List, and () representing the Tuple as whole.

解决方案,按照格式将RDD中的每个元素映射到一个字符串.我对Java语法不太熟悉,但是对Scala来说,我会做类似的事情,

Solution, map each element in your RDD to a string as per your format. I'm not that familiar with the Java Syntax but with Scala I'll do something like,

rdd.map(e=>s"${e._1}\t${e._2.mkString(",")}")

mkString使用一些定界符将集合连接起来的地方.

Where mkString concatenates a collection using some delimiter.

让我知道这是否有帮助.干杯.

Let me know if this helped. Cheers.

这篇关于将RDD对以特定格式保存在输出文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆