将RDD对以特定格式保存在输出文件中 [英] Saving The RDD pair in particular format in the output file
问题描述
我有一个JavaPairRDD,可以说类型
I have a JavaPairRDD lets say data of type
<Integer,List<Integer>>
当我执行data.saveAsTextFile("output")时 输出将包含以下格式的数据:
when i do data.saveAsTextFile("output") The output will contain the data in the following format:
(1,[1,2,3,4])
(1,[1,2,3,4])
等...
我想要在输出文件中这样的内容:
I want something like this in the output file :
1 1,2,3,4
1 1,2,3,4
i.e. 1\t1,2,3,4
任何帮助将不胜感激
推荐答案
您需要了解这里发生的情况.您有一个RDD[T,U]
,其中T和U是某些obj类型,将其读取为T和U的元组的RDD.在此RDD上,当调用saveAsTextFile()
时,它将RDD的每个元素实质上转换为字符串,因此是文本文件作为输出生成.
You need to understand what's happening here. You have an RDD[T,U]
where T and U are some obj types, read it as RDD of Tuple of T and U. On this RDD when you call saveAsTextFile()
, it essentially converts each element of RDD to string, hence the text file is generated as output.
现在,类型T的对象如何转换为字符串?通过调用toString().这就是为什么让[]代表列表,而()代表整个元组的原因.
Now, how is an object of some type T converted to a string? By calling the toString() on it. This is the reason why you have [] representing the List, and () representing the Tuple as whole.
解决方案,按照格式将RDD中的每个元素映射到一个字符串.我对Java语法不太熟悉,但是对Scala来说,我会做类似的事情,
Solution, map each element in your RDD to a string as per your format. I'm not that familiar with the Java Syntax but with Scala I'll do something like,
rdd.map(e=>s"${e._1}\t${e._2.mkString(",")}")
mkString使用一些定界符将集合连接起来的地方.
Where mkString concatenates a collection using some delimiter.
让我知道这是否有帮助.干杯.
Let me know if this helped. Cheers.
这篇关于将RDD对以特定格式保存在输出文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!