将Array(Row)的RDD转换为Row的RDD吗? [英] Convert RDD of Array(Row) to RDD of Row?
本文介绍了将Array(Row)的RDD转换为Row的RDD吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的文件中有这样的数据,我想使用Spark进行一些统计。
I have such data in a file and I'd like to do some statistics using Spark.
文件内容:
aaa|bbb|ccc
ddd|eee|fff|ggg
我需要为每行分配一个ID。我将它们读为rdd并使用 zipWithIndex()
。
I need to assign each line an id. I read them as rdd and use zipWithIndex()
.
然后它们应该像:
(0, aaa|bbb|ccc)
(1, ddd|eee|fff|ggg)
我需要使每个与ID关联的字符串。我可以获得Array(Row)的RDD,但不能跳出数组。
I need to make each string associated with the id. I can get the RDD of Array(Row), but can't jump out of the array.
我应该如何修改代码?
import org.apache.spark.sql.{Row, SparkSession}
val fileRDD = spark.sparkContext.textFile(filePath)
val fileWithIdRDD = fileRDD.zipWithIndex()
// make the line like this: (0, aaa), (0, bbb), (0, ccc)
// each line is a record of Array(Row)
fileWithIdRDD.map(x => {
val id = x._1
val str = x._2
val strArr = str.split("\\|")
val rowArr = strArr.map(y => {
Row(id, y)
})
rowArr
})
现在看起来像:
[(0, aaa), (0, bbb), (0, ccc)]
[(1, ddd), (1, eee), (1, fff), (1, ggg)]
但最后我想要:
(0, aaa)
(0, bbb)
(0, ccc)
(1, ddd)
(1, eee)
(1, fff)
(1, ggg)
推荐答案
您只需要展平 RDD
yourRDD.flatMap(array => array)
考虑您的代码(已修复一些错误,内部映射以及ID和ID的分配str)
Considering your code (some errors fixed, inside the inner map and in the assignation of id and str)
fileWithIdRDD.map(x => {
val id = x._1
val str = x._2
val strArr = str.split("\\|")
val rowArr = strArr.map(y => {
Row(id, y)
})
rowArr
}).flatMap(array => array)
此处的简单示例:
输入
fileWithIdRDD.collect
res30: Array[(Int, String)] = Array((0,aaa|bbb|ccc), (1,ddd|eee|fff|ggg))
执行
scala> fileWithIdRDD.map(x => {
val id = x._1
val str = x._2
val strArr = str.split("\\|")
val rowArr = strArr.map(y => {
Row(id, y)
})
rowArr
}).flatMap(array => array)
res31: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[17] at flatMap at <console>:35
输出
scala> res31.collect
res32: Array[org.apache.spark.sql.Row] = Array([0,aaa], [0,bbb], [0,ccc], [1,ddd], [1,eee], [1,fff], [1,ggg])
这篇关于将Array(Row)的RDD转换为Row的RDD吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文