矩阵在Spark中的RowMatrix上转置 [英] Matrix Transpose on RowMatrix in Spark

查看:419
本文介绍了矩阵在Spark中的RowMatrix上转置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个RowMatrix.

Suppose I have a RowMatrix.

  1. 我如何转置它. API文档似乎没有转置方法.
  2. Matrix具有transpose()方法.但是它不是分布式的.如果我有一个比内存大的大矩阵,该如何转置它?
  3. 我已将RowMatrix转换为DenseMatrix如下

  1. How can I transpose it. The API documentation does not seem to have a transpose method.
  2. The Matrix has the transpose() method. But it is not distributed. If I have a large matrix greater that the memory how can I transpose it?
  3. I have converted a RowMatrix to DenseMatrix as follows

DenseMatrix Mat = new DenseMatrix(m,n,MatArr);

这需要将RowMatrix转换为JavaRDD并将JavaRDD转换为数组.

which requires converting the RowMatrix to JavaRDD and converting JavaRDD to an array.

还有其他方便的转换方法吗?

Is there any other convenient way to do the conversion?

预先感谢

推荐答案

您是正确的:没有

 RowMatrix.transpose()

方法.您将需要手动执行此操作.

method. You will need to do this operation manually.

这是 非分布式/本地 矩阵版本:

Here is the non-distributed/local matrix versions:

def transpose(m: Array[Array[Double]]): Array[Array[Double]] = {
    (for {
      c <- m(0).indices
    } yield m.map(_(c)) ).toArray
}

分布式版本 如下:

The distributed version would be along the following lines:

    origMatRdd.rows.zipWithIndex.map{ case (rvect, i) =>
        rvect.zipWithIndex.map{ case (ax, j) => ((j,(i,ax))
    }.groupByKey
    .sortBy{ case (i, ax) => i }
    .foldByKey(new DenseVector(origMatRdd.numRows())) { case (dv, (ix,ax))  =>
              dv(ix) = ax
     }

注意事项:我尚未测试以上内容:将会存在错误.但是基本方法是有效的-与我过去为小型Spark的LinAlg库所做的工作类似.

Caveat: I have not tested the above: it will have bugs. But the basic approach is valid - and similar to work I had done in the past for a small LinAlg library for spark.

这篇关于矩阵在Spark中的RowMatrix上转置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆