R程序中超大矩阵的svd [英] svd of very large matrix in R program

查看:252
本文介绍了R程序中超大矩阵的svd的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在txt文件中有一个60,000 x 60,000矩阵,我需要获取此矩阵的svd.我使用R,但我不知道R是否可以生成它.

I have a matrix 60 000 x 60 000 in a txt file, I need to get svd of this matrix. I use R but I don´t know if R can generate it.

推荐答案

我认为可以使用irlba包以及bigmemorybigalgebra来计算(部分)svd,而无需使用大量内存.

I think it's possible to compute (partial) svd using the irlba package and bigmemory and bigalgebra without using a lot of memory.

首先让我们创建一个20000 * 20000矩阵并将其保存到文件中

First let's create a 20000 * 20000 matrix and save it into a file

require(bigmemory)
require(bigalgebra)
require(irlba)

con <- file("mat.txt", open = "a")
replicate(20, {
    x <- matrix(rnorm(1000 * 20000), nrow = 1000)
    write.table(x, file  = 'mat.txt', append = TRUE,
            row.names = FALSE, col.names = FALSE)
})

file.info("mat.txt")$size
## [1] 7.264e+09   7.3 Gb
close(con)

然后您可以使用bigmemory::read.big.matrix

bigm <- read.big.matrix("mat.txt", sep = " ",
                        type = "double",
                        backingfile = "mat.bk",
                        backingpath = "/tmp",
                        descriptorfile = "mat.desc")

str(bigm)
## Formal class 'big.matrix' [package "bigmemory"] with 1 slots
##   ..@ address:<externalptr>

dim(bigm)
## [1] 20000 20000

bigm[1:3, 1:3]
##            [,1]     [,2]     [,3]
## [1,] -0.3623255 -0.58463 -0.23172
## [2,] -0.0011427  0.62771  0.73589
## [3,] -0.1440494 -0.59673 -1.66319

现在,我们可以使用如包装插图中所述的使用出色的irlba包装.

Now we can use the use the excellent irlba package as explained in the package vignette.

第一步是定义可以与big.matrix对象一起使用的矩阵乘法运算符,然后使用irlba::irlba函数

The first step consist of defining matrix multiplication operator which can work with big.matrix object and then use the irlba::irlba function

### vignette("irlba", package = "irlba") # for more info

matmul <- function(A, B, transpose=FALSE) {
    ## Bigalgebra requires matrix/vector arguments
    if(is.null(dim(B))) B <- cbind(B)

    if(transpose)
        return(cbind((t(B) %*% A)[]))

    cbind((A %*% B)[])
}

dim(bigm)

system.time(
S <- irlba(bigm, nu = 2, nv = 2, matmul = matmul)
)

##    user  system elapsed 
## 169.820   0.923 170.194


str(S)
## List of 5
##  $ d    : num [1:2] 283 283
##  $ u    : num [1:20000, 1:2] -0.00615 -0.00753 -0.00301 -0.00615 0.00734 ...
##  $ v    : num [1:20000, 1:2] 0.020086 0.012503 0.001065 -0.000607 -0.006009 ...
##  $ iter : num 10
##  $ mprod: num 310

我忘了设置种子使其可繁殖,但我只是想表明在R中可以做到这一点.

I forgot to set the seed to make it reproductible but I just wanted to show that it's possible to do that in R.

编辑

如果您正在使用软件包irlba的新版本,则上面的代码将引发错误,因为函数irlbamatmult参数已重命名为mult.因此,您应该更改代码的这一部分

If you are using a new version of the package irlba, the above code throw an error because the matmult parameter of the function irlba has been renamed to mult. Therefore, you should change this part of the code

S <- irlba(bigm, nu = 2, nv = 2, matmul = matmul)

通过

S <- irlba(bigm, nu = 2, nv = 2, mult = matmul)

我要感谢@FrankD指出这一点.

I want to thank @FrankD for pointing this out.

这篇关于R程序中超大矩阵的svd的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆