如何使用Spark从SVD组件重构原始矩阵 [英] How to reconstruct original matrix from svd components with Spark

查看:90
本文介绍了如何使用Spark从SVD组件重构原始矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想重建(近似)在SVD中分解的原始矩阵.有没有一种方法可以不必将V factor本地Matrix转换为DenseMatrix?

I want to reconstruct (the approximation of) the original matrix decomposed in SVD. Is there a way to do this without having to convert the V factor local Matrix into a DenseMatrix?

以下是根据文档(请注意,注释来自doc示例)

Here is the decomposition based on the documentation (note that the comments are from the doc example)

import org.apache.spark.mllib.linalg.Matrix
import org.apache.spark.mllib.linalg.SingularValueDecomposition
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.mllib.linalg.distributed.RowMatrix

val data = Array(
  Vectors.dense(1.0, 0.0, 7.0, 0.0, 0.0),
  Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
  Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0))

val dataRDD = sc.parallelize(data, 2)

val mat: RowMatrix = new RowMatrix(dataRDD)

// Compute the top 5 singular values and corresponding singular vectors.
val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(5, computeU = true)
val U: RowMatrix = svd.U  // The U factor is a RowMatrix.
val s: Vector = svd.s  // The singular values are stored in a local dense vector.
val V: Matrix = svd.V  // The V factor is a local dense matrix.

要重建原始矩阵,我必须计算U *对角线*转置(V).

To reconstruct the original matrix, I have to compute U * diagonal(s) * transpose(V).

第一件事是将奇异值向量s转换为对角矩阵S.

First thing is to convert the singular value vector s into a diagonal matrix S.

import org.apache.spark.mllib.linalg.Matrices
val S = Matrices.diag(s)

但是当我尝试计算U *对角线*转置(V)时,出现以下错误.

But when I try to compute U * diagonal(s) * transpose(V): I get the following error.

val dataApprox = U.multiply(S.multiply(V.transpose))

我收到以下错误:

错误:类型不匹配; 找到:org.apache.spark.mllib.linalg.Matrix 必需:org.apache.spark.mllib.linalg.DenseMatrix

error: type mismatch; found: org.apache.spark.mllib.linalg.Matrix required: org.apache.spark.mllib.linalg.DenseMatrix

如果我将Matrix V转换为DenseMatrix Vdense

import org.apache.spark.mllib.linalg.DenseMatrix
val Vdense = new DenseMatrix(V.numRows, V.numCols,  V.toArray)
val dataApprox = U.multiply(S.multiply(Vdense.transpose))

是否有一种方法可以在不进行此转换的情况下从svd的输出中获取原始矩阵dataApprox的近似值?

Is there a way to get the approx of the original matrix dataApprox out of the output of svd without this conversion?

推荐答案

以下对我有用的代码

//numTopSingularValues=Features used for SVD
val latentFeatureArray=s.toArray

//Making a ListBuffer to Make a DenseMatrix for s
var denseMatListBuffer=ListBuffer.empty[Double]
val zeroListBuffer=ListBuffer.empty[Double]
var addZeroIndex=0
while (addZeroIndex < numTopSingularValues )
  {
    zeroListBuffer+=0.0D
    addZeroIndex+=1
  }
var addDiagElemIndex=0
while(addDiagElemIndex<(numTopSingularValues-1))
  {
    denseMatListBuffer+=latentFeatureArray(addDiagElemIndex)
    denseMatListBuffer.appendAll(zeroListBuffer)
    addDiagElemIndex+=1
  }
denseMatListBuffer+=latentFeatureArray(numTopSingularValues-1)

val sDenseMatrix=new DenseMatrix(numTopSingularValues,numTopSingularValues,denseMatListBuffer.toArray)

val vMultiplyS=V.multiply(sDenseMatrix)

val postMulWithUDenseMat=vMultiplyS.transpose

val dataApprox=U.multiply(postMulWithUDenseMat)

这篇关于如何使用Spark从SVD组件重构原始矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆