如何在转星火的RDD [英] How to transpose an RDD in Spark
问题描述
我有一个这样的RDD:
I have an RDD like this:
1 2 3
4 5 6
7 8 9
有一个矩阵。现在我想转的RDD是这样的:
It is a matrix. Now I want to transpose the RDD like this:
1 4 7
2 5 8
3 6 9
我怎样才能做到这一点?
How can I do this?
推荐答案
假设你有一个N&次;×M矩阵
Say you have an N×M matrix.
如果N和M都非常小,你可以按住N'倍;存储器M的项目,它并没有多大意义,使用RDD。但换位很容易:
If both N and M are so small that you can hold N×M items in memory, it doesn't make much sense to use an RDD. But transposing it is easy:
val rdd = sc.parallelize(Seq(Seq(1, 2, 3), Seq(4, 5, 6), Seq(7, 8, 9)))
val transposed = sc.parallelize(rdd.collect.toSeq.transpose)
如果N或M是如此之大,你不能在内存中保留N或者M的条目,那么你就不能有这种规模的RDD线。无论是原始或转置矩阵是不可能在此情况下,重新present
If N or M is so large that you cannot hold N or M entries in memory, then you cannot have an RDD line of this size. Either the original or the transposed matrix is impossible to represent in this case.
N和M可以是一个中等规模的:你可以在内存中保留N或者M条目,但你不能持有N'次,M项。在这种情况下,你必须炸毁矩阵,并再次把它在一起:
N and M may be of a medium size: you can hold N or M entries in memory, but you cannot hold N×M entries. In this case you have to blow up the matrix and put it together again:
val rdd = sc.parallelize(Seq(Seq(1, 2, 3), Seq(4, 5, 6), Seq(7, 8, 9)))
// Split the matrix into one number per line.
val byColumnAndRow = rdd.zipWithIndex.flatMap {
case (row, rowIndex) => row.zipWithIndex.map {
case (number, columnIndex) => columnIndex -> (rowIndex, number)
}
}
// Build up the transposed matrix. Group and sort by column index first.
val byColumn = byColumnAndRow.groupByKey.sortByKey().values
// Then sort by row index.
val transposed = byColumn.map {
indexedRow => indexedRow.toSeq.sortBy(_._1).map(_._2)
}
这篇关于如何在转星火的RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!